{
  "WorkItem": {
    "AffectedComponent": {
      "Name": "",
      "DisplayName": ""
    },
    "ClosedComment": "implemented in changeset 45892.  First binary v1.9.0.29",
    "ClosedDate": "2009-11-04T01:25:51.857-08:00",
    "CommentCount": 0,
    "Custom": null,
    "Description": "DotNetZip uses only a single CPU on a multi-cpu machine when zipping up archives, only a single core on multi-core machines.\n \nCreation of ZIP files with this library is a CPU-bound operation.  The compression and encryption are the CPU-intensive bits.  The DEFLATE algorithm repeatedly scans through the data to be compressed, looking for repeated sequences in the data.  It is effectively a search.  But not only does it look for repeated sequences, it searches for the longest repeated sequences, which means repeated scans through the data.  In contrast the encryption is not as expensive.  And relatively speaking, the IO done by DotNetZip is not important in terms of a performance limiter.  \n \nThere are a couple ways to go about exploiting multi-core, multi-cpu machines: \n- parallelize the DEFLATE algorithm.  There is some research here, need to see if appropriate and practical to integrate into DotNetZip\n- use mutliple threads to parallelize the processing of each file, then re-assemble the files back together into the zip archive.\n \nAnother option to explore for improving performance, independent and complementary to this one, is to exploit a native-C++ version of ZLIB.  That is for another workitem.",
    "LastUpdatedDate": "2013-05-16T05:32:44.057-07:00",
    "PlannedForRelease": "1.9.1.5 DotNetZip",
    "ReleaseVisibleToPublic": true,
    "Priority": {
      "Name": "Low",
      "Severity": 50,
      "Id": 1
    },
    "ProjectName": "DotNetZip",
    "ReportedDate": "2008-06-07T12:38:20.877-07:00",
    "Status": {
      "Name": "Closed",
      "Id": 4
    },
    "ReasonClosed": {
      "Name": "Unassigned"
    },
    "Summary": "Parallel deflate, to optimize the Zip library to use multi-processors",
    "Type": {
      "Name": "Feature",
      "Id": 1
    },
    "VoteCount": 13,
    "Id": 5183
  },
  "FileAttachments": [],
  "Comments": [
    {
      "Message": "I considered building a parallelized DEFLATE algorithm, but that seemed pretty ambitious.  I preferred the approach of segmenting the incoming file and compressing each of the parts; this seemed mroe incremental and maintainable.  I searched around and found pigz, which is a parallel gzip written in C by Marc Adler, one of the co-authors of zlib.  It uses a pretty simple approach: just divide the incoming stream into blocks and compress each block independently in a separate thread, then serialize the write-out of the aggregated data.  \r\nUsing the same idea with the Ionic.Zlib compressor, I wrote some C# parallel deflate code and was able to nearly double the throughput for compressing a large file on my dual-core laptop.   With a parallel approach, the compressed data was 100.5% of the size of the single-thread approach, and it took roughly 56% of the time.  \r\n\r\nThis seems very promising.  \r\n\r\nThe parallel approach uses more memory, and is slightly less efficient.  It is slower for small files.  This is no surprise, there's overhead in thread scheduling.  I don't know the crossover point.  I need to look into selection of tuning params, including auto-tuning.  \r\n\r\nIt's not done yet but it looks promising for the next couple of weeks. \r\n",
      "PostedDate": "2009-03-21T22:15:16.163-07:00",
      "Id": -2147483648
    },
    {
      "Message": "ps: also, I found that by compiling the Release build, instead of the Debug build, I was able to improve performance by 50%, even without the parallel approach.  The perf improvement moving from Debug to Release is complementary to the perf improvement with threaded compression.  A large file that compresses in 31 seconds using the Debug/Single threaded approach compresses in 9 seconds if I use both the Release build and threaded compression. \r\n\r\noh, and \"looks promising for the next couple of weeks\" means, I expect deliver this feature in the v1.8 preview within the next couple of weeks.\r\n",
      "PostedDate": "2009-03-21T22:20:20.287-07:00",
      "Id": -2147483648
    },
    {
      "Message": "An update. I now have a ParallelDeflateStream class that operates as described.   It uses a divide-and-conquer approach in compression: it splits the file into segments, and then compresses each segment individually on a separate thread.  It then writes out the ordered segments to the output stream.   This improves performance when compressing large files - say over 10mb.  So this is good. \r\n\r\nThinking about this further, this ParallelDeflateStream will not deliver performance improvements for any zipfile that does not have a significant number of large files.  Improving the DEFLATE performance for large files is good, but I also need to improve the performance of a Save() on *any* zipfile.  That means parallelizing the Save() operation, not just the deflate operation. This would also use the divide=and-conquer approach:  when saving, save each entry in a different thread, and then re-assemble them in order when it's time to write them out. \r\n\r\nI have more work to do. \r\n",
      "PostedDate": "2009-03-22T23:11:24.77-07:00",
      "Id": -2147483648
    },
    {
      "Message": "",
      "PostedDate": "2009-03-26T14:31:07.46-07:00",
      "Id": -2147483648
    },
    {
      "Message": "Parallelizing the ZipFile.Save() is a challenge.  There is a bunch of re-work involved in that.  Still looking at it.  ",
      "PostedDate": "2009-03-26T17:57:49.807-07:00",
      "Id": -2147483648
    },
    {
      "Message": "Keep this thread updated, this is very interesting and promising !\r\nI've moved all my C# tools from SharpZipLib to DotNetZip which is now the best .NET zip compression library over the world :)\r\n\r\nThanks a lot",
      "PostedDate": "2009-03-28T00:57:39.263-07:00",
      "Id": -2147483648
    },
    {
      "Message": "",
      "PostedDate": "2009-04-04T05:40:57-07:00",
      "Id": -2147483648
    },
    {
      "Message": "",
      "PostedDate": "2009-05-25T12:22:38.46-07:00",
      "Id": -2147483648
    },
    {
      "Message": "",
      "PostedDate": "2009-05-26T17:17:52.923-07:00",
      "Id": -2147483648
    },
    {
      "Message": "",
      "PostedDate": "2009-05-27T06:02:49.807-07:00",
      "Id": -2147483648
    },
    {
      "Message": "",
      "PostedDate": "2009-06-19T11:23:46.1-07:00",
      "Id": -2147483648
    },
    {
      "Message": "",
      "PostedDate": "2009-08-16T19:38:27.103-07:00",
      "Id": -2147483648
    },
    {
      "Message": "",
      "PostedDate": "2009-08-28T13:43:58.467-07:00",
      "Id": -2147483648
    },
    {
      "Message": "",
      "PostedDate": "2009-08-28T18:14:55.917-07:00",
      "Id": -2147483648
    },
    {
      "Message": "It's a very long time since I updated this workitem.  Although I expected to finish with this a long time ago, many other things have crept in to delay the work.  Finally I have something to offer. \r\n\r\nI've decided to provide parallel deflate, and skip the parallel save, for now.  In my tests on large files, say, larger than 512k, the parallel option can cut the time required by 30-45%.  \r\n\r\nWhat used to take 3 seconds now takes 1.6.  What used to take 40 seconds now takes 21.  \r\n\r\nThere's no programming change at all.  It's just an implementation change within the library.  \r\n\r\nThere is one new element in the DotNetZip interface, that you use to tune this capability.  ZipFile.ParallelDeflateThreshold.  It is the minimum size, in bytes, of a stream that will be deflated using multiple threads.  By default it is 512k.  Any stream that is smaller than that, in length, will be deflated in the \"old\" manner.  Check the doc for full info.  \r\n\r\nThis will be available in v1.9.0.29, which I will post in a few hours, when the test run finishes. \r\n\r\nI may still implement the parallel save, as well.",
      "PostedDate": "2009-11-03T03:22:58.58-08:00",
      "Id": -2147483648
    },
    {
      "Message": "",
      "PostedDate": "2009-11-03T04:07:00.93-08:00",
      "Id": -2147483648
    },
    {
      "Message": "",
      "PostedDate": "2009-11-04T01:25:51.857-08:00",
      "Id": -2147483648
    },
    {
      "Message": "wow I'm pretty excited! even without .NET 4 this will be a significant imporvement! good luck Cheeso!",
      "PostedDate": "2009-11-04T10:11:31.5-08:00",
      "Id": -2147483648
    },
    {
      "Message": "",
      "PostedDate": "2013-02-21T18:44:47.743-08:00",
      "Id": -2147483648
    },
    {
      "Message": "",
      "PostedDate": "2013-05-16T05:32:44.057-07:00",
      "Id": -2147483648
    }
  ]
}