11

I'm working on a reporting application for my Django powered website. I want to run several reports and have each report generate a .csv file in memory that can be downloaded in batch as a .zip. I would like to do this without storing any files to disk. So far, to generate a single .csv file, I am following the common operation:

mem_file = StringIO.StringIO()
writer = csv.writer(mem_file)
writer.writerow(["My content", my_value])
mem_file.seek(0)
response = HttpResponse(mem_file, content_type='text/csv')
response['Content-Disposition'] = 'attachment; filename=my_file.csv'

This works fine, but only for a single, unzipped .csv. If I had, for example, a list of .csv files created with a StringIO stream:

firstFile = StringIO.StringIO()
# write some data to the file

secondFile = StringIO.StringIO()
# write some data to the file

thirdFile = StringIO.StringIO()
# write some data to the file

myFiles = [firstFile, secondFile, thirdFile]

How could I return a compressed file that contains all objects in myFiles and can be properly unzipped to reveal three .csv files?

Jamie Counsell
  • 7,730
  • 6
  • 46
  • 81

3 Answers3

16

zipfile is a standard library module that does exactly what you're looking for. For your use-case, the meat and potatoes is a method called "writestr" that takes a name of a file and the data contained within it that you'd like to zip.

In the code below, I've used a sequential naming scheme for the files when they're unzipped, but this can be switched to whatever you'd like.

import zipfile
import StringIO

zipped_file = StringIO.StringIO()
with zipfile.ZipFile(zipped_file, 'w') as zip:
    for i, file in enumerate(files):
        file.seek(0)
        zip.writestr("{}.csv".format(i), file.read())

zipped_file.seek(0)

If you want to future-proof your code (hint hint Python 3 hint hint), you might want to switch over to using io.BytesIO instead of StringIO, since Python 3 is all about the bytes. Another bonus is that explicit seeks are not necessary with io.BytesIO before reads (I haven't tested this behavior with Django's HttpResponse, so I've left that final seek in there just in case).

import io
import zipfile

zipped_file = io.BytesIO()
with zipfile.ZipFile(zipped_file, 'w') as f:
    for i, file in enumerate(files):
        f.writestr("{}.csv".format(i), file.getvalue())

zipped_file.seek(0)
Dan Loewenherz
  • 10,879
  • 7
  • 50
  • 81
  • 2
    Complete and comprehensive, and thank you for including the BytesIO information for the future! This method crossed my mind but for some reason I didn't think it was possible, as I though the content_type is what identified the file as a .csv. I guess writing the extension the way you did does the trick. Thanks! I have to wait a few more hours to award the bounty. – Jamie Counsell Aug 08 '14 at 19:30
  • 1
    Glad to have helped! :) – Dan Loewenherz Aug 08 '14 at 22:09
  • @DanLoewenherz please can you tell me what is "files" here for i, file in enumerate(files) and what does it contains? – snehil singh Mar 15 '19 at 17:15
2

The stdlib comes with the module zipfile, and the main class, ZipFile, accepts a file or file-like object:

from zipfile import ZipFile
temp_file = StringIO.StringIO()
zipped = ZipFile(temp_file, 'w')

# create temp csv_files = [(name1, data1), (name2, data2), ... ]

for name, data in csv_files:
    data.seek(0)
    zipped.writestr(name, data.read())

zipped.close()

temp_file.seek(0)

# etc. etc.

I'm not a user of StringIO so I may have the seek and read out of place, but hopefully you get the idea.

Ethan Furman
  • 63,992
  • 20
  • 159
  • 237
  • 1
    I would suggest cStringIO for performance as it's written completely in C instead Python, already comes with Python native library, so it should use less memory overhead too. – denisvm Aug 08 '14 at 01:29
1
def zipFiles(files):
    outfile = StringIO() # io.BytesIO() for python 3
    with zipfile.ZipFile(outfile, 'w') as zf:
        for n, f in enumarate(files):
            zf.writestr("{}.csv".format(n), f.getvalue())
    return outfile.getvalue()

zipped_file = zip_files(myfiles)
response = HttpResponse(zipped_file, content_type='application/octet-stream')
response['Content-Disposition'] = 'attachment; filename=my_file.zip'

StringIO has getvalue method which return the entire contents. You can compress the zipfile by zipfile.ZipFile(outfile, 'w', zipfile.ZIP_DEFLATED). Default value of compression is ZIP_STORED which will create zip file without compressing.

ashwin
  • 50
  • 2