0

I have a Yesod application (but question is more general than this) that allows file uploads. I also allow file dowloads. I would like to allow users download multiple files with single link. As per this question: How to download multiple files with one HTTP request? the only solution seems to be creation of file archive with all the files inside.

I want to do it in constant memory in Haskell using libraries from Hackage, without writing to disk or executing external programs.

In particular the following are non-solutions:

  • calling external programs to create an archive: the files may be on disk or in some database on accessible via some remote url. the filesystem may be "read only". executing external programs may be impossible for security reasons. external programs complicate deployment.

  • creating temporary archive on disk from source files: see "read only" filesystem above. Also quite inefficient: writing to disk is pretty slow actually.

  • creating complete archive in memory and serving it afterwards: the files may be quite large (think CD images) and multiple. The memory needed would be too great.

Community
  • 1
  • 1
Tener
  • 5,280
  • 4
  • 25
  • 44
  • If you do this in-memory and have 10 users downloading 5x 100MB files each, you'll need 5GB+ of RAM just for the archiving. Doesn't seem particularly scalable. – Polynomial Jun 08 '12 at 10:44
  • 2
    @Polynomial, if you read the question, @Tener explicitly doesn't want to keep the whole archive in memory. There are plenty of implementations of `gzip` and `zip` that can compress content on-the-fly and stream it. – dflemstr Jun 08 '12 at 10:49
  • @dflemstr Whoops, missed the last part of the question. Still, this seems like it'd annihilate the server's CPU during even moderate load. – Polynomial Jun 08 '12 at 10:51
  • 2
    The two forms of compression that I mentioned are actually ridiculously fast; in some cases much faster than, say, SSL encryption. Did you know that [most big web-pages actually transfer all files as gzip-compressed data](http://en.wikipedia.org/wiki/HTTP_compression)? It's so fast that it's almost always worth doing. – dflemstr Jun 08 '12 at 10:55

1 Answers1

1

It very much depends on which file formats you want to support (.zip, .tar.gz, tar.bz2 are the most common), but you can use the zip-archive library to create .zip archives. These archives are produced as lazy byte-strings, meaning that they will be generated on-the-fly. The only tricky part is to produce a value of type Archive with the correct contents. It might for example look like this:

import Codec.Archive.Zip

-- ... and in your code:
let archiveTemplate =
  Archive
  { zComment = ByteString.pack "Downloaded from mysite.com"
  , zSignature = Nothing
  , zEntries = []
  }

let filesIWantToInclude = ["foo.png", "bar.iso"]
entries <- forM filesIWantToInclude $ readEntry []
let archive = foldr addEntryToArchive archiveTemplate entries

let byteString = fromArchive archive
-- Now you can send the byteString over the network, or something.

If you don't have files on your file system that you want to compress, but instead files in a database or something, you can manually build values of type Entry with the correct fields filled in. You only need a lazy ByteString representing the data you want to compress, nothing more; then you can use the toEntry function to generate an entry. It might be worth mentioning that the eRelativePath field in Entry is the relative path of the file inside of the .zip archive, not the actual relative path in the file system.

dflemstr
  • 25,947
  • 5
  • 70
  • 105
  • I did look at this library. At first sight it appears to be a non solution. I didn't test it, but looking from the source readEntry uses toEntry which tries to be smart and only compress if needed. The test is made by trying to compress whole file and looking if it helps. I think this will make the whole file stored in memory and therefore the whole library will consume just way too much memory. Manual creation of Entry might be possible though. I would need code for crc32 calculation though. – Tener Jun 08 '12 at 11:38
  • You could just copy the internal compression method and leave out the size comparison... – dflemstr Jun 08 '12 at 12:38