6

In my Azure role running C# code inside a 64 bit process I want to download a ZIP file and unpack it as fast as possible. I figured I could do the following: create a MemoryStream instance, download to that MemoryStream, then pass the stream to some ZIP handling library for unpacking and once unpacking is done discard the stream. This way I would get rid of write-read-write sequence that unnecessarily performs a lot of I/O.

However I've read that MemoryStream is backed by an array and with half gigabytes that array will definitely be considered a "large object" and will be allocated in a large object heap that doesn't compact on garbage collection. Which makes me worried that maybe this usage of MemoryStream will lead to fragmenting the process memory and negative long term effects.

Will this likely have any long-term negative effects on my process?

Community
  • 1
  • 1
sharptooth
  • 167,383
  • 100
  • 513
  • 979

1 Answers1

1

The answer is in the accepted answer to the question you linked to. Thanks for providing the reference.

The real problem is assuming that a program should be allowed to consume all virtual memory at any time. A problem that otherwise disappears completely by just running the code on a 64-bit operating system.

I would say if this is a 64 bit process you have nothing to worry about.

The hole that is created only leads to fragmentation of the virtual address space of the LOH. Fragmentation here isn't a big problem for you. In a 64 bit process any whole pages wasted due to fragmentation will just become unused and the physical memory they were mapped to becomes available again to map a new page. Very few partial pages will be wasted because these are large allocations. And locality of reference (the other advantage of defragmentation) is mostly preserved, again because these are large allocations.

John Watts
  • 8,717
  • 1
  • 31
  • 35
  • By the way, I still don't think this is the right solution to your original problem. You ought to be able to decompress the zip on the fly. Doesn't the download give you a stream? And you can decompress from a stream to a stream. – John Watts Aug 09 '12 at 11:40
  • Are you aware of a library that can unpack a ZIP in one read and has a reasonably permissive license? – sharptooth Aug 09 '12 at 12:09
  • 1
    I'm coming from a Java world and I don't know all the equivalents but I know you can read a gzip stream in one pass. Any chance you could switch? Are there lots of files in the zip? Do you need to read all of them? – John Watts Aug 09 '12 at 13:03
  • There's a whole tree of files and that's why I need ZIP - if that was a single file I would just store it as is. – sharptooth Aug 09 '12 at 14:09
  • If you don't even need compression, I recommend tar. You'll still need a thirdparty lib. Or you could make your own archive format: \0... – John Watts Aug 09 '12 at 17:56