[
  {
    "Id": "252227",
    "ThreadId": "73763",
    "Html": "<p>Hello to everybody!</p>\r\n<p>I'm writing a class in C# for a backup utility with incremental backup option, but my strategy is terribly slow. How I can optimize it? Am I doing something wrong?</p>\r\n<p>For my tests, I used:</p>\r\n<ul>\r\n<li>a folder (the Thuderbird Profile folder; mine is about 660MB on disk; up to 6500 files and 86 folders);</li>\r\n<li>a Zipped file of Thuderbird Profile folder, but not updated</li>\r\n</ul>\r\n<p>First and complete zipping takes 3 minutes and 23 seconds (compressed zip is about 340MB)</p>\r\n<p>Incremental (and partial) backup takes 2 minutes and 35 seconds (compressed zip of incremental backup is about 160MB): it loses one minutes and 17 seconds to compare the files and its ModifiedTime (so, one minutes and 17 seconds before Save() method is invoked).</p>\r\n<p>Well, now let me show you my code:</p>\r\n<div style=\"color:Black;background-color:White\">\r\n<pre>ZipFile OldZip = ZipFile.Read(nameOfZipFile);<br>ZipFile NewZip = <span style=\"color:Blue\">new</span> ZipFile();<br><br>NewZip.UseZip64WhenSaving = Zip64Option.AsNecessary;<br>NewZip.CompressionLevel = CompressionLevel.Level2;<br>NewZip.UseUnicodeAsNecessary = <span style=\"color:Blue\">true</span>;<br><br><span style=\"color:Blue\">foreach</span> (FileInfo currentFileOnDisk <span style=\"color:Blue\">in</span> FileList)<br>{<br>   ZipEntry candidateZipEntry = OldZip[currentFileOnDisk.FullName.Replace(mailclientsClass.thunderbirdProfilePath, <span style=\"color:#A31515\">&quot;ThunderbirdProfile&quot;</span>)];<br><span style=\"color:Blue\">   if</span> (candidateZipEntry != <span style=\"color:Blue\">null</span> &amp;&amp; currentFileOnDisk.LastWriteTimeUtc &gt; candidateZipEntry.ModifiedTime)<span style=\"color:Green\">&nbsp;</span><br>     NewZip.AddFile(currentFileOnDisk.FullName, Path.GetDirectoryName(currentFileOnDisk.FullName.Replace(mailclientsClass.thunderbirdProfilePath, <span style=\"color:#A31515\">&quot;ThunderbirdProfile&quot;</span>)));<br><span style=\"color:Blue\">   else</span> <span style=\"color:Blue\">if</span> (candidateZipEntry == <span style=\"color:Blue\">null</span>)<br>     NewZip.AddFile(currentFileOnDisk.FullName, Path.GetDirectoryName(currentFileOnDisk.FullName.Replace(mailclientsClass.thunderbirdProfilePath, <span style=\"color:#A31515\">&quot;ThunderbirdProfile&quot;</span>)));<br>}<br><br>NewZip.Save(nameOfZipFile+incrementalCounter.ToString());<br><br></pre>\r\n</div>\r\n<p>Yes, I know, this code is not complete for a good incremental backup strategy, because I need more istructions to check removed files &amp; C. but for now I'm in trouble for its slowness. Is it possible to optimize it?</p>\r\n<p>&nbsp;</p>\r\n<p>Many thanks!</p>\r\n<p>Bye,</p>\r\n<p>Alessandro.</p>",
    "PostedDate": "2009-11-01T16:58:55.19-08:00",
    "UserRole": null,
    "MarkedAsAnswerDate": null
  },
  {
    "Id": "253150",
    "ThreadId": "73763",
    "Html": "<table border=0 width=800>\r\n<tbody>\r\n<tr>\r\n<td>\r\n<p>Ciao Alessandro,</p>\r\n<p>I have a couple of suggestions.</p>\r\n<ol>\r\n<li>The <a href=\"http://cheeso.members.winisp.net/DotNetZipHelp/html/776a5035-37e3-4fb2-d76e-0a52e1421581.htm\">ZipOutputSteam</a> can be slightly faster to use than the ZipFile class.&nbsp; In my tests on a selection of regular filesystem files, the difference is about 5%-10%.&nbsp; It is less powerful though, and cannot update zip files.&nbsp; The compression is equivalent.&nbsp; You may want to check it out.</li>\r\n<li>you can fiddle with the <a href=\"http://cheeso.members.winisp.net/DotNetZipHelp/html/bccfc53c-6132-c9bc-00f5-b31136e712b5.htm\">CodecBufferSize</a> and <a href=\"http://cheeso.members.winisp.net/DotNetZipHelp/html/2c08ec0b-e3af-7125-8cdc-3bb7f406343b.htm\">BufferSize</a> properties on the ZipFile instance.&nbsp; These can significantly affect performance.&nbsp; The optimal setting depends on the sizes of the files you are zipping, the memory you have available, and the relative speed of your disk access.&nbsp; I try to make a good guess for what they should be, but you may find some improvements by altering them.</li>\r\n<li>In v1.9.0.29,&nbsp;which I have just released, there is a multi-threaded deflate&nbsp;implementation, which can cut the time to compress a file by 45% on&nbsp;a simple dual-core laptop, or more on a 4p machine.&nbsp; There is one new programming interface - the property called <a href=\"http://cheeso.members.winisp.net/DotNetZipHelp/html/dd2291bf-1c5a-8f1b-cfa6-34239e73ad13.htm\">ParallelDeflateThreshold</a>., which allows you to&nbsp;set the filesize for which multi-threaded deflate is used.&nbsp; The multi-threaded deflate is beneficial when the file being compressed is larger than about 300k.&nbsp; Below that and it is detrimental.&nbsp; But this varies by machine.&nbsp; You can set the threshold as appropriate for your scenario.&nbsp; This multi-threaded deflate also works with ZipOutputStream.&nbsp; It tends to be less effective at lower levels of compression. </li>\r\n<li>You may also save some time by doing a File.GetLastWriteUtc instead of constructing a FileInfo on each file.&nbsp; FileInfo contains other information that you may not need, and it may be faster to simply get the modification time, if that is all you want. </li>\r\n</ol>\r\n<p>&nbsp;</p>\r\n</td>\r\n</tr>\r\n</tbody>\r\n</table>",
    "PostedDate": "2009-11-04T01:22:28.52-08:00",
    "UserRole": null,
    "MarkedAsAnswerDate": null
  },
  {
    "Id": "253665",
    "ThreadId": "73763",
    "Html": "<p>Wow! Wow! Wow! Mr. Dino, the 1.9.0.29 is really amazing!</p>\r\n<p>I repeated my test (my PC is a DELL Inspiron 6400 with a Core 2 Duo processor) and it takes only 1 minutes and 52 seconds with a gain of 80% (Yes, I know: you will not believe me, but the 1.8.4.26 completed the test in 3 minutes and 23 seconds).</p>\r\n<p>I added only these options:</p>\r\n<p>&nbsp;</p>\r\n<div style=\"color:Black;background-color:White\">\r\n<pre>NewZip.ParallelDeflateThreshold = 2097152; <span style=\"color:Green\">//2MB</span>\r\nNewZip.BufferSize = 65536 * 8;\r\n</pre>\r\n</div>\r\n<p>&nbsp;</p>\r\n<p>But I will do more tests, because I think that I can get more performance from your library! For example, I will try ZipOutputStream and your precious advices.</p>\r\n<p>I'm going to use a catalog in SQLite too, to trace the last modified files (now I open each ZIP and I check the modified time of each zip entry).</p>\r\n<p>&nbsp;</p>\r\n<p>P.S: I will keep you informed if I will get more performance with your advices!</p>\r\n<p>Bye,</p>\r\n<p>and...</p>\r\n<div style=\"color:Black;background-color:White\">\r\n<pre><span style=\"color:Blue\">while</span>(<span style=\"color:Blue\">true</span>)<br><br>{<br><br>    Console.WriteLine(<span style=\"color:#A31515\">&quot;THANKS SO MUCH!&quot;</span>);<br><br>}<br></pre>\r\n</div>\r\n<p>&nbsp;</p>",
    "PostedDate": "2009-11-05T01:38:35.103-08:00",
    "UserRole": null,
    "MarkedAsAnswerDate": null
  },
  {
    "Id": "253699",
    "ThreadId": "73763",
    "Html": "<table border=0 width=800>\r\n<tbody>\r\n<tr>\r\n<td>\r\n<p>Wow, that's a nice gain in performance.</p>\r\n<p>Listen, Alex, about the ParallelDeflateThreshold - it's a new part of DotNetZip, maybe I haven't documented it clearly enough.&nbsp; I want to make sure you understand what it is doing.</p>\r\n<p>By setting it to 2mb, you are saying that only files larger than 2mb should be deflated with multiple threads.&nbsp; In my experience, on my machine, any file larger than 512k showed a significant performance advantage when using the parallel deflate.&nbsp; So you may get more of a gain if you lower that number from 2mb to maybe 1mb or 875k.&nbsp; But maybe you already tested it and found that 2mb is the right number.</p>\r\n<p>Also see <a href=\"http://dotnetzip.codeplex.com/Thread/View.aspx?ThreadId=74148\">my other post</a> about the performance analysis of v1.9.0.29.&nbsp; It shows the effects I measured when modifying the CodecBufferSize and (IO) BufferSize .</p>\r\n<p>Good luck!</p>\r\n<p>&nbsp;ps: the Parallel deflate capability is an ON/OFF thing for ZipOutputStream.&nbsp; On the ZipFile, the library knows the size of the files it is zipping, and so it can make a decision on whether to use the parallel deflate based on the size of the file.&nbsp; In the ZipOutputStream, because of the different programming model, there's no way for the library to know how large the file is going to be, before it is written.&nbsp; So, with ZipOutputStream, the only meaningful values for the ParallelDeflateThreshold property are 0 (always use parallel deflate) or -1L (never).&nbsp; Any other value implies (never).</p>\r\n<p>One last thing: be sure you are using RELEASE dlls for DotNetZip.&nbsp; The difference between&nbsp; a DEBUG dll and a RELEASE dll can be 40-50%.&nbsp; I have both RELEASE and DEBUG dlls on the downloads&nbsp;page.&nbsp; For performance, you want RELEASE.</p>\r\n<p>&nbsp;</p>\r\n</td>\r\n</tr>\r\n</tbody>\r\n</table>",
    "PostedDate": "2009-11-05T03:21:00.413-08:00",
    "UserRole": null,
    "MarkedAsAnswerDate": null
  },
  {
    "Id": "254188",
    "ThreadId": "73763",
    "Html": "<p>Thanks so much for explanations!</p>\r\n<p>Yesterday I tried ParallelDeflateThreshold set to 2MB for a quick test, but after I read your explanations, I tried ParallelDeflateThreshold property set to&nbsp;512KB.</p>\r\n<p>I tried&nbsp;also ParallelDeflateThreshold set to 750KB but I didn't observe any difference, instead with it set to 2MB, it seems to be slightly&nbsp;slower (2 seconds)</p>\r\n<p>I set up BufferSize to 128KB too. Now, with ParallelDeflateThreshold set to 512KB and BufferSize set to 128KB,&nbsp;it seems to have the best performance on my machine.&nbsp;</p>\r\n<p>Thanks to your DotNetZip 1.9, now my program is faster then KLS MailBackup. You have done a really great job with parallel deflate!</p>\r\n<p>P.S: I downloaded this file: <a id=\"ctl00_ctl00_MasterContent_Content_ReleasePanelOrderingContainer_ReleaseFilesCtrl_FileList_ctl01_FileNameHyperLink\" style=\"text-decoration:underline;color:#3e62a6\" tabindex=9 href=\"http://dotnetzip.codeplex.com/Release/ProjectReleases.aspx?ReleaseId=27890#DownloadId=81012\">DotNetZipLib-Runtime-v1.9.zip</a>&nbsp;and I use the Reduced DLL. I think that it is the RELEASE version, isn't?</p>\r\n<p>Thanks so much for your job!</p>",
    "PostedDate": "2009-11-06T03:22:42.087-08:00",
    "UserRole": null,
    "MarkedAsAnswerDate": null
  },
  {
    "Id": "254201",
    "ThreadId": "73763",
    "Html": "<p>I'm glad it's working for you!</p>\r\n<p>About the RELEASE version - yes, you have the right one. In the Devkit download, there is a DEBUG dll and a RELEASE dll . The Runtime download, which is the one you grabbed, includes only the RELEASE version.</p>\r\n<p>&nbsp;</p>",
    "PostedDate": "2009-11-06T04:12:46.68-08:00",
    "UserRole": null,
    "MarkedAsAnswerDate": null
  }
]