[
  {
    "Id": "253642",
    "ThreadId": "74148",
    "Html": "<p>I just posted DotNetZip v1.9.0.29, which includes an update to speed up compression. It uses multiple threads to perform the compression, resulting in a significant performance gain in some cases.&nbsp;</p>\r\n<p>The chart below shows the time in seconds, required to compress a directory that contained about 1600 files, totalling about 230mb - text, images, music, and other stuff.&nbsp; &nbsp;I ran hundreds of trials to analyze the performance, and these are the results. What you see is 4 broad bands or ranges.&nbsp; Starting from the top are all the trials corresponding to the ZipOutputStream class.&nbsp; Beneath that band, showing significantly better performance is the band for ZipOutputStream using multiple threads. (Shorter bars means less time, better performance)&nbsp; Then the band for the ZipFile class, and the ZipFile with multiple threads.&nbsp;All these tests were conducted on my laptop - A 3 year old Compaq nc8430 with Core Duo processor.&nbsp;</p>\r\n<p>As you can see the use of multiple threads on my laptop increased performance signficantly.&nbsp; The average time to compress the directory without using multiple threads was about 40 seconds.&nbsp; Using threads, that average went down to as low as 23 seconds, with a particular combination of buffer sizes.&nbsp; The bottom line is, using multiple threads can significantly increase the speed of compression.</p>\r\n<p><img src=\"http://i38.tinypic.com/2lv1v6s.jpg\" alt=\"\"></p>\r\n<p>&nbsp;</p>\r\n<p>Don't forget - you get this performance for free.&nbsp; There's no code change required.&nbsp;All you have to do is get v1.9.0.29.&nbsp;&nbsp;&nbsp;The performance gain comes from an improvement I made within the library.&nbsp; Your zip operations should get faster, automatically.&nbsp; Also, the compression effectiveness does not change significantly.&nbsp; It still compresses &quot;about the same&quot; as a regular deflate.&nbsp; There may be a slight increase in size to go along with this increase in speed - but it should be in the range of 0.1% to 5%.&nbsp; And, it's still completely compatible with all the other zip tools.&nbsp; It still uses the DEFLATE algorithm.&nbsp; It's just faster.</p>\r\n<p>How did I do it?&nbsp; I split up large files into chunks, and compress the chunks independently.&nbsp; Each chunk is compressed on a background thread (from the .NET threadpool).&nbsp; If you look at a CPU monitor on multi-cpu or multi-core computers when zipping with DotNetZip in versions prior to v1.9.0.29, typically you will see one processor totally maxed out, and the other processor or processors doing relatively nothing.&nbsp; This is because there was just one thread doing all the work.&nbsp; Now, with the multi-thread design, all of the processors or cores can be used to perform the compression. The 45% drop in compression time might even be larger on a 4-cpu machine, as many desktops are today.&nbsp; On a 8-core machine, as many servers are, you might see a 70% drop in compression time. I don't know because I don't have an appropriate machine to test this on.</p>\r\n<p>Just for fun, I also did some comparisons with SharpZipLib, which is an alternative zip library for .NET.&nbsp; SharpZipLib is pretty fast - the FastZip interface was zipping the same directory in about 35 seconds.&nbsp; DotNetZip without multiple threads does it in about 39 seconds.&nbsp; But with the multiple threads, DotNetZip's best time got down to 21 seconds.&nbsp; Pretty nice gain.&nbsp;</p>\r\n<p>Now, what about the variations within the bands?&nbsp; What&nbsp;is that all about?&nbsp; In DotNetZip, there are two classes of buffer uses internally - those used for regular IO (the IO buffer, used for reading and writing files or streams), and those used for the compressor (the codec buffer).&nbsp;&nbsp; The former can be set with the BufferSize property on the ZipFile or ZipOutputStream; the latter can be set with the CodecBufferSize property. I wanted to find out if varying the sizes of those buffers would have a significant impact on compression speed.&nbsp; It turns out that there is an effect, though it is much smaller than the&nbsp;effect associated with the use of multiple threads.&nbsp; &nbsp; There was a variation of 10-20% between the best and worst combinations of buffer sizes.</p>\r\n<p>For my test situation, the best times came with an IO buffer of 32k or 64k, and across a wide range of codec buffer sizes.</p>\r\n<p>Here's a closer look at one of the bands. All of the bars in the following chart pertain to the ZipOutputStream class.</p>\r\n<p><img src=\"http://i37.tinypic.com/rqx9p4.jpg\" alt=\"\"></p>\r\n<p>This one shows that using buffer sizes of 1mb on my machine, led to poor performance. Also, using very small buffers - 2k, led to poor performance. The sweet spot was an IO buffer size between 8k and 64k, and a codec buffer size between 4k and 128k.&nbsp; You can see some hysteresis in the chart - that's probably due to the fact that this was not a dedicated benchmark machine.&nbsp; It was my laptop, and I was using it while the tests were running. Also I think I got a virus scan to happen at some point in there, too.&nbsp; So that explains the spikes in compression time that is evident there.&nbsp; I ran over 800 trials over the course of a day.</p>\r\n<hr>\r\n<p>These results will vary depending on the kinds of files you compress, the amount of memory on your system, the speed of your I/O subsystem, the speed of memory access, and the compression level you use. (all these results used the default compression level.&nbsp; In a few tests I found smaller performance&nbsp;gains using multiple threads, when using lower compression levels). &nbsp;I only publish these charts to show you the relative importance of the buffer size versus the multi-thread approach.</p>\r\n<p>There is a limitation - the perf gains in compression are nice, but when you also encrypt your zip entries, you get the &quot;old' performance.&nbsp; This is because encryption is a CPU-intensive task as well, and I haven't yet updated the library to use multiple threads when encrypting.&nbsp; As a result, everything slows down to the slowest part in the system, which is encryption.&nbsp; In the future I hope to be able to speed that up, as well.&nbsp;</p>\r\n<p>Any questions let me know.</p>\r\n<p>&nbsp;</p>",
    "PostedDate": "2009-11-05T00:10:57.513-08:00",
    "UserRole": null,
    "MarkedAsAnswerDate": null
  },
  {
    "Id": "478775",
    "ThreadId": "74148",
    "Html": "<p>Cheeso,</p>\r\n<p>Thanks for posting this.</p>\r\n<p>(1) When I initialize a ZipFile instance the default for CodecBufferSize is 0. Is this intended?</p>\r\n<p>(2) Is CodecBufferSize a buffer for the uncompressed data that is going to be compressed? I.e. If size = 2048, then compression does not take place until 2048 of data is written</p>",
    "PostedDate": "2010-08-10T12:48:56.977-07:00",
    "UserRole": null,
    "MarkedAsAnswerDate": null
  },
  {
    "Id": "479750",
    "ThreadId": "74148",
    "Html": "<p>The 0 for CodecBufferSize is a marker for &quot;use the default value&quot;. &nbsp;I don't believe the actual default value is currently published or documented, but I may be wrong about that.&nbsp;</p>\r\n<p>The way the codec buffer works: using the IO buffer, the library tries to read N bytes from the input file or stream; depending on the IO conditions, that read may be fully satisfied, partially satisfied, or not satisfied at all (zero bytes read). &nbsp;If the read is fully or partially satisfied, then DotNetZip will do compression on that portion of data. &nbsp;If zero bytes are read, that indicates EOF, and DotNetZip concludes that the file or stream has been read fully, and stops.&nbsp;</p>\r\n<p>DotNetZip then does chunkwise compression of the data in the IO buffer, in this way: it copies a chunk into the codec buffer, compresses it, then copies it to the output stream. &nbsp;Therefore if your codec buffer is 2048 bytes, DotNetZip will try to compress 2048 bytes at a time. &nbsp;</p>\r\n<p>So when the stream is larger than 2048 bytes, if your codec buffer is 2048 bytes, then YES, compression does not take place until you read/write 2048 bytes. &nbsp;If the stream is smaller than 2048 bytes, then compression occurs at input EOF. &nbsp;</p>\r\n<p>I caution you to <strong>not</strong> draw performance conclusions from the description of how things work. &nbsp;The &nbsp;behavior of a computer program that uses DotNetZip depends on many factors. &nbsp;Really the only way to optimize performance of such a complex system is to try and test it. With the various settings, DotNetZip gives you the chance to optimize things as appropriate for your usage.&nbsp;</p>\r\n<p>Good luck.</p>",
    "PostedDate": "2010-08-12T08:30:26.037-07:00",
    "UserRole": null,
    "MarkedAsAnswerDate": null
  }
]