5

In a web app I am working on, the user can create a zip archive of a folder full of files. Here here's the code:

files = torrent[0].files
    zipfile = z.ZipFile(zipname, 'w')
    output = ""

    for f in files:
        zipfile.write(settings.PYRAT_TRANSMISSION_DOWNLOAD_DIR + "/" + f.name, f.name)

downloadurl = settings.PYRAT_DOWNLOAD_BASE_URL + "/" + settings.PYRAT_ARCHIVE_DIR + "/" + filename
output = "Download <a href=\"" + downloadurl + "\">" + torrent_name + "</a>"
return HttpResponse(output)

But this has the nasty side effect of a long wait (10+ seconds) while the zip archive is being downloaded. Is it possible to skip this? Instead of saving the archive to a file, is it possible to send it straight to the user?

I do beleive that torrentflux provides this excat feature I am talking about. Being able to zip GBs of data and download it within a second.

Josh Hunt
  • 14,225
  • 26
  • 79
  • 98

5 Answers5

13

Check this Serving dynamically generated ZIP archives in Django

Community
  • 1
  • 1
jitter
  • 53,475
  • 11
  • 111
  • 124
9

As mandrake says, constructor of HttpResponse accepts iterable objects.

Luckily, ZIP format is such that archive can be created in single pass, central directory record is located at the very end of file:

enter image description here

(Picture from Wikipedia)

And luckily, zipfile indeed doesn't do any seeks as long as you only add files.

Here is the code I came up with. Some notes:

  • I'm using this code for zipping up a bunch of JPEG pictures. There is no point compressing them, I'm using ZIP only as container.
  • Memory usage is O(size_of_largest_file) not O(size_of_archive). And this is good enough for me: many relatively small files that add up to potentially huge archive
  • This code doesn't set Content-Length header, so user doesn't get nice progress indication. It should be possible to calculate this in advance if sizes of all files are known.
  • Serving the ZIP straight to user like this means that resume on downloads won't work.

So, here goes:

import zipfile

class ZipBuffer(object):
    """ A file-like object for zipfile.ZipFile to write into. """

    def __init__(self):
        self.data = []
        self.pos = 0

    def write(self, data):
        self.data.append(data)
        self.pos += len(data)

    def tell(self):
        # zipfile calls this so we need it
        return self.pos

    def flush(self):
        # zipfile calls this so we need it
        pass

    def get_and_clear(self):
        result = self.data
        self.data = []
        return result

def generate_zipped_stream():
    sink = ZipBuffer()
    archive = zipfile.ZipFile(sink, "w")
    for filename in ["file1.txt", "file2.txt"]:
        archive.writestr(filename, "contents of file here")
        for chunk in sink.get_and_clear():
            yield chunk

    archive.close()
    # close() generates some more data, so we yield that too
    for chunk in sink.get_and_clear():
        yield chunk

def my_django_view(request):
    response = HttpResponse(generate_zipped_stream(), mimetype="application/zip")
    response['Content-Disposition'] = 'attachment; filename=archive.zip'
    return response
Pēteris Caune
  • 43,578
  • 6
  • 59
  • 81
5

Here's a simple Django view function which zips up (as an example) any readable files in /tmp and returns the zip file.

from django.http import HttpResponse
import zipfile
import os
from cStringIO import StringIO # caveats for Python 3.0 apply

def somezip(request):
    file = StringIO()
    zf = zipfile.ZipFile(file, mode='w', compression=zipfile.ZIP_DEFLATED)
    for fn in os.listdir("/tmp"):
        path = os.path.join("/tmp", fn)
        if os.path.isfile(path):
            try:
                zf.write(path)
            except IOError:
                pass
    zf.close()
    response = HttpResponse(file.getvalue(), mimetype="application/zip")
    response['Content-Disposition'] = 'attachment; filename=yourfiles.zip'
    return response

Of course this approach will only work if the zip files will conveniently fit into memory - if not, you'll have to use a disk file (which you're trying to avoid). In that case, you just replace the file = StringIO() with file = open('/path/to/yourfiles.zip', 'wb') and replace the file.getvalue() with code to read the contents of the disk file.

Vinay Sajip
  • 95,872
  • 14
  • 179
  • 191
2

Does the zip library you are using allow for output to a stream. You could stream directly to the user instead of temporarily writing to a zip file THEN streaming to the user.

Brad Bruce
  • 7,638
  • 3
  • 39
  • 60
0

It is possible to pass an iterator to the constructor of a HttpResponse (see docs). That would allow you to create a custom iterator that generates data as it is being requested. However I don't think that will work with a zip (you would have to send partial zip as it is being created).

The proper way, I think, would be to create the files offline, in a separate process. The user could then monitor the progress and then download the file when its ready (possibly by using the iterator method described above). This would be similar what sites like youtube use when you upload a file and wait for it to be processed.

mandrake
  • 1,213
  • 1
  • 14
  • 28