70

How to serve users a dynamically generated ZIP archive in Django?

I'm making a site, where users can choose any combination of available books and download them as ZIP archive. I'm worried that generating such archives for each request would slow my server down to a crawl. I have also heard that Django doesn't currently have a good solution for serving dynamically generated files.

zuber
  • 3,449
  • 2
  • 24
  • 19

10 Answers10

49

The solution is as follows.

Use Python module zipfile to create zip archive, but as the file specify StringIO object (ZipFile constructor requires file-like object). Add files you want to compress. Then in your Django application return the content of StringIO object in HttpResponse with mimetype set to application/x-zip-compressed (or at least application/octet-stream). If you want, you can set content-disposition header, but this should not be really required.

But beware, creating zip archives on each request is bad idea and this may kill your server (not counting timeouts if the archives are large). Performance-wise approach is to cache generated output somewhere in filesystem and regenerate it only if source files have changed. Even better idea is to prepare archives in advance (eg. by cron job) and have your web server serving them as usual statics.

Engineero
  • 12,340
  • 5
  • 53
  • 75
zgoda
  • 12,775
  • 4
  • 37
  • 46
  • StringIO will be gone in Python 3.0, so you may want to bracket your code accordingly. – Jeff Bauer Jan 07 '09 at 19:27
  • 14
    It's not gone, just moved to the io module. http://docs.python.org/3.0/library/io.html#io.StringIO –  Jun 14 '09 at 15:52
  • 1
    Just as a thought, as you're already manually creating a HttpResponse, could you not use that as the buffer? By that I mean pass the response to `zipfile` and let it write directly into that. I've done it with other things. If you're dealing with hefty streams, it might be faster and more memory efficient. – Oli Jan 17 '12 at 18:34
  • @Oli would be nice, but ZipFile requires `f.seek()` which `HttpResponse` doesn't support – dbr Oct 18 '12 at 09:33
46

Here's a Django view to do this:

import os
import zipfile
import StringIO

from django.http import HttpResponse


def getfiles(request):
    # Files (local path) to put in the .zip
    # FIXME: Change this (get paths from DB etc)
    filenames = ["/tmp/file1.txt", "/tmp/file2.txt"]

    # Folder name in ZIP archive which contains the above files
    # E.g [thearchive.zip]/somefiles/file2.txt
    # FIXME: Set this to something better
    zip_subdir = "somefiles"
    zip_filename = "%s.zip" % zip_subdir

    # Open StringIO to grab in-memory ZIP contents
    s = StringIO.StringIO()

    # The zip compressor
    zf = zipfile.ZipFile(s, "w")

    for fpath in filenames:
        # Calculate path for file in zip
        fdir, fname = os.path.split(fpath)
        zip_path = os.path.join(zip_subdir, fname)

        # Add file, at correct path
        zf.write(fpath, zip_path)

    # Must close zip for all contents to be written
    zf.close()

    # Grab ZIP file from in-memory, make response with correct MIME-type
    resp = HttpResponse(s.getvalue(), mimetype = "application/x-zip-compressed")
    # ..and correct content-disposition
    resp['Content-Disposition'] = 'attachment; filename=%s' % zip_filename

    return resp
dbr
  • 165,801
  • 69
  • 278
  • 343
  • 2
    Not needed in this example, but in general make sure the filename in the content-disposition header is quoted and escaped as needed. For example, if there's a space in the filename, most browsers will only use the part up to the space for the filename (e.g. `attachment; filename=Test File.zip` gets saved as `Test`.) – Mike DeSimone May 15 '13 at 19:09
  • @MikeDeSimone Good point. Is there a good method to escape the filename for such a context? – dbr May 16 '13 at 15:00
  • http://stackoverflow.com/questions/93551/how-to-encode-the-filename-parameter-of-content-disposition-header-in-http – Mike DeSimone May 16 '13 at 19:19
  • 8
    for django Version > 1.7 use content_type instead of mimetype – renzop Jan 18 '17 at 17:01
  • 2
    Can I replace this with `b = BytesIO.BytesIO()` for binary fils? – qarthandso Jul 25 '17 at 17:18
34

Many answers here suggest to use a StringIO or BytesIO buffer. However this is not needed as HttpResponse is already a file-like object:

response = HttpResponse(content_type='application/zip')
zip_file = zipfile.ZipFile(response, 'w')
for filename in filenames:
    zip_file.write(filename)
response['Content-Disposition'] = 'attachment; filename={}'.format(zipfile_name)
return response

Note that you should not call zip_file.close() as the open "file" is response and we definitely don't want to close it.

Antoine Pinsard
  • 33,148
  • 8
  • 67
  • 87
11

I used Django 2.0 and Python 3.6.

import zipfile
import os
from io import BytesIO

def download_zip_file(request):
    filelist = ["path/to/file-11.txt", "path/to/file-22.txt"]

    byte_data = BytesIO()
    zip_file = zipfile.ZipFile(byte_data, "w")

    for file in filelist:
        filename = os.path.basename(os.path.normpath(file))
        zip_file.write(file, filename)
    zip_file.close()

    response = HttpResponse(byte_data.getvalue(), content_type='application/zip')
    response['Content-Disposition'] = 'attachment; filename=files.zip'

    # Print list files in zip_file
    zip_file.printdir()

    return response
Pasha M
  • 221
  • 3
  • 5
  • Hey, I have the same goal to be done but instead of filelist, I have multiple images url and it needs to be downloaded and zipped and then given out as response, any idea how to stream this, I mean I have a working code, all I needed to do was use requests to get the image and write it to BytesIO and then to zip_file, but if the images are large in size it takes too much time to download and then it times out. Any help is fine. TY – yashas123 Feb 09 '20 at 15:06
  • This is a bad answer. You are loading the entire zipfile in memory. Imagine a 10GB file – sandes May 19 '20 at 05:33
  • good answer if you are dealing with small few files btw – Jotunheim Nov 25 '20 at 08:05
8

For python3 i use the io.ByteIO since StringIO is deprecated to achieve this. Hope it helps.

import io

def my_downloadable_zip(request):
    zip_io = io.BytesIO()
    with zipfile.ZipFile(zip_io, mode='w', compression=zipfile.ZIP_DEFLATED) as backup_zip:
        backup_zip.write('file_name_loc_to_zip') # u can also make use of list of filename location
                                                 # and do some iteration over it
     response = HttpResponse(zip_io.getvalue(), content_type='application/x-zip-compressed')
     response['Content-Disposition'] = 'attachment; filename=%s' % 'your_zipfilename' + ".zip"
     response['Content-Length'] = zip_io.tell()
     return response
pitaside
  • 650
  • 10
  • 14
  • Using code just like this, I cannot get the file to be named correctly. At the moment, it's just a random string that looks like a UUID. – freethebees Mar 28 '17 at 10:09
6

Django doesn't directly handle the generation of dynamic content (specifically Zip files). That work would be done by Python's standard library. You can take a look at how to dynamically create a Zip file in Python here.

If you're worried about it slowing down your server you can cache the requests if you expect to have many of the same requests. You can use Django's cache framework to help you with that.

Overall, zipping files can be CPU intensive but Django shouldn't be any slower than another Python web framework.

bfox
  • 274
  • 2
  • 13
Cristian
  • 42,563
  • 25
  • 88
  • 99
5

Shameless plug: you can use django-zipview for the same purpose.

After a pip install django-zipview:

from zipview.views import BaseZipView

from reviews import Review


class CommentsArchiveView(BaseZipView):
    """Download at once all comments for a review."""

    def get_files(self):
        document_key = self.kwargs.get('document_key')
        reviews = Review.objects \
            .filter(document__document_key=document_key) \
            .exclude(comments__isnull=True)

        return [review.comments.file for review in reviews if review.comments.name]
Thibault J
  • 4,336
  • 33
  • 44
2

I suggest to use separate model for storing those temp zip files. You can create zip on-fly, save to model with filefield and finally send url to user.

Advantages:

  • Serving static zip files with django media mechanism (like usual uploads).
  • Ability to cleanup stale zip files by regular cron script execution (which can use date field from zip file model).
carefulweb
  • 347
  • 1
  • 4
0

A lot of contributions were made to the topic already, but since I came across this thread when I first researched this problem, I thought I'd add my own two cents.

Integrating your own zip creation is probably not as robust and optimized as web-server-level solutions. At the same time, we're using Nginx and it doesn't come with a module out of the box.

You can, however, compile Nginx with the mod_zip module (see here for a docker image with the latest stable Nginx version, and an alpine base making it smaller than the default Nginx image). This adds the zip stream capabilities.

Then Django just needs to serve a list of files to zip, all done! It is a little more reusable to use a library for this file list response, and django-zip-stream offers just that.

Sadly it never really worked for me, so I started a fork with fixes and improvements.

You can use it in a few lines:

def download_view(request, name=""):
    from django_zip_stream.responses import FolderZipResponse
    path = settings.STATIC_ROOT
    path = os.path.join(path, name)

    return FolderZipResponse(path)

You need a way to have Nginx serve all files that you want to archive, but that's it.

yspreen
  • 1,759
  • 2
  • 20
  • 44
-1

Can't you just write a link to a "zip server" or whatnot? Why does the zip archive itself need to be served from Django? A 90's era CGI script to generate a zip and spit it to stdout is really all that's required here, at least as far as I can see.

Andy Ross
  • 11,699
  • 1
  • 34
  • 31