On the fly zip or tar creation?

Question

I may NOT bother with this but if its very simple i may consider it. The site i am working on by design is to hold hundreds of thousands of files. I dont know if we'll have only one download or multiple. Right now the choices are A) Just the file B) An archive that has the file + license and conditions.

I am trying to figure out it can be efficient to offer both and use something like file.open/read prefixing an archive header before it and after it which contains the license and other zip contents. My biggest worries are doing file open/read will not be as efficient as letting the server transmit the file and if its hard to generate and change the contents of the zip dynamically (if a user wishes to change the license or if we want to add other data such as author description, author URL and a permalink on the site)

Is it efficient and how would i create the file dynamically only the original file and data pulled from the database?

PS: I am using debian/apache/asp.net using xsp.net and mono.

score 0 · Answer 1 · answered Oct 21 '10 at 09:14

0

SharpZipLib is a very nice stream based library that you can use to create archive files.

answered Oct 21 '10 at 09:14

spender

117,338
33
229
351

I gave it a quick try and it seems to take 10x longer then reading the file and sticking it into a memorystream. I set the compression level to 0. It doesnt look like its compressing it but i wonder why its taking that much longer. Surely crc32ing it doesnt take that long – Oct 21 '10 at 11:13

score 0 · Answer 2 · answered Oct 21 '10 at 09:36

0

You can use zip libraries (System.IO.Packaging.ZipPackage, dotnetzip, SharpZipLib) or event command line programs (say 7zip) for compressing the file . The library should offer better performance.

However, important thing will be to add an caching layer i.e. zip files should be cached in file system so that they can be served directly if request comes for the same.

answered Oct 21 '10 at 09:36

VinayC

47,395
5
59
72

dotnetzip compressed way faster then sharplib. However i was hoping for something that gives me the nearly the same speed as regular IO transfers. However since there must be a crc32 checksum maybe that is what is slowing things down. I might consider using dotnetzip if i do this. And dotnet was roughly 2-2.5x slower then raw IO which is pretty fast. – Oct 21 '10 at 12:41
The thing that slows DotNetZip is the DEFLATE algorithm, not the CRC. DEFLATE is CPU intensive and... necessarily "slow". If you don't care to compress, you can turn it off, by setting the CompressionLevel to 0. If you want to try a TAR archive, I wrote a class that creates TAR **files**; you could modify it to handle TAR out to streams. http://cheeso.members.winisp.net/srcview.aspx?dir=Tar&file=Tar.cs Regarding "slow" - you have to consider latency vs throughput. At scale, DotNetZip will still perform very well, though TTFB may be slower than "raw IO". – Cheeso Oct 24 '10 at 16:32
@Cheeso: sharplib was slow while compression being off. dotnetzip was way faster. I may try modifying this. it may be a few months until i need it – Oct 31 '10 at 22:23

On the fly zip or tar creation?

2 Answers2