{
  "WorkItem": {
    "AffectedComponent": {
      "Name": "",
      "DisplayName": ""
    },
    "ClosedComment": "fixed in changeset 53120.  First bnary v1.9.0.34",
    "ClosedDate": "2009-12-26T22:08:53.377-08:00",
    "CommentCount": 0,
    "Custom": null,
    "Description": "This is version 1.8...we have zip files with ~30,000 entries, and the time to add entries becomes 10X slower towards the end.  So much slower that it turned out it took more time calling AddEntry() then it did to compress all of the data!\n \nWe did a bit of analysis and discovered that:\na. _entities is a List<> collection\nb. Adding a new entry calls InsureValidEntry() which does a linear search through the collection looking for duplicates.  100% of the time this search is guarenteed to fail (unless you are adding duplicates)\nc. InsureUniqueEntry() calls TrimVolumeAndSwapSlashes() on the source filename inside of the the loop, when it only needs to be done once outside the loop\n \nProposed fixes:\n1. move TrimVolumeAndSwapSlashes() outside of loop\n2. change List<ZipEntry> _entities to Dictionary<ZipEntry> _entities so that you can simply replace the loops with a if (!entities.ContainsKey(filename))",
    "LastUpdatedDate": "2013-05-16T05:31:58.03-07:00",
    "PlannedForRelease": "",
    "ReleaseVisibleToPublic": false,
    "Priority": {
      "Name": "Low",
      "Severity": 50,
      "Id": 1
    },
    "ProjectName": "DotNetZip",
    "ReportedDate": "2009-12-06T10:28:08.593-08:00",
    "Status": {
      "Name": "Closed",
      "Id": 4
    },
    "ReasonClosed": {
      "Name": "Unassigned"
    },
    "Summary": "Using List<ZipEntity> _entities collection has scaling problems with large numbers of files in a zip",
    "Type": {
      "Name": "Issue",
      "Id": 3
    },
    "VoteCount": 1,
    "Id": 9596
  },
  "FileAttachments": [],
  "Comments": [
    {
      "Message": "Thanks for the report. \r\nYes, I have been aware of the scaling issues you described, for some time.  My tests show that the majority of the cost is in InsureUniqueEntry().  I agree with your point that it is guaranteed to do nothing, unless there are duplicates. In fact this is what the call is intended to do - protect against adding duplicates. \r\n\r\nI think your other observation re: TrimVolumeAndSwapSlashes is a good one, too. \r\n\r\nI've got to think about how to fix the uniqueness check.   Making it a Dictionary and using the ContainsKey method may be an option.  I'll have to check the performance of it. With a dictionary containing 30,000 items, it may still be very slow. \r\n",
      "PostedDate": "2009-12-07T06:23:24.703-08:00",
      "Id": -2147483648
    },
    {
      "Message": "One option might be to simply remove the uniqueness check completely. \r\n",
      "PostedDate": "2009-12-07T06:29:51.247-08:00",
      "Id": -2147483648
    },
    {
      "Message": "I have coded up using Dictionary and the time becomes flat. For the most part it's really easy, as you just replace all of your foreach(var item in _entries) with foreach(var item in _entries.Values).\r\n\r\nThe only thing weird is there is no ReadOnly() method on the dictionary.Values, and so the methods which return that require you to build the list on the fly.  I'm not certain if that is the intent of ReadOnly() (aka, does it give you a dynamically changing collection which is readonly?)\r\n\r\nThe other way to do it would be to use a BloomFilter (http://bloomfilter.codeplex.com/ ) which gives you a super fast way of checking a negative\r\n",
      "PostedDate": "2009-12-07T15:58:29.12-08:00",
      "Id": -2147483648
    },
    {
      "Message": "Oh, BTW: I stated that I tried the dictionary approach..but raw numbers each 100 items it took ~25ms all the way up to 35,000 items for 7 sec total, as opposed to existing code taking 71 seconds to add 35,000 items.",
      "PostedDate": "2009-12-08T09:41:40.837-08:00",
      "Id": -2147483648
    },
    {
      "Message": "I ran a quick test, and had a similar gain:  18s vevrsus 200s to add 35,000 entries.  I am currently running through the unit tests for DotNetZip. ",
      "PostedDate": "2009-12-26T14:52:04.67-08:00",
      "Id": -2147483648
    },
    {
      "Message": "",
      "PostedDate": "2009-12-26T22:08:53.377-08:00",
      "Id": -2147483648
    },
    {
      "Message": "",
      "PostedDate": "2013-02-21T18:43:46.43-08:00",
      "Id": -2147483648
    },
    {
      "Message": "",
      "PostedDate": "2013-05-16T05:31:58.03-07:00",
      "Id": -2147483648
    }
  ]
}