0

I am developing a web application in Python 3.9 using django 3.0.7 as a framework. In Python I am creating a function that can convert a dictionary to a json and then to a zip file using the ZipFile library. Currently this is the code in use:

def zip_dict(data: dict) -> bytes:
    with io.BytesIO() as archive:
        unzipped = bytes(json.dumps(data), "utf-8")
        with zipfile.ZipFile(archive, mode="w", compression=zipfile.ZIP_DEFLATED) as zipFile:
            zipFile.writestr(zinfo_or_arcname="data", data=unzipped)
        return archive.getvalue()

Then I save the zip in the Azure Blob Storage. It works but the problem is that this function is a bit slow for my purposes. I tried to use the zlib library but the performance doesn't change and, moreover, the created zip file seems to be corrupt (I can't even open it manually with WinRAR). Is there any other library to increase the compression speed (without touching the compression ratio)?

Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
El_Merendero
  • 603
  • 3
  • 14
  • 28
  • To me, `the created zip file seems to be corrupt` indicates that you're doing something incorrectly. Also, I think there are a lot of unanswered questions here. How big are the files? Is the data in them relatively ordered or is it not that compressible? What is the compression ratio you get when zipping the file, either via command line tools or via Python? How long is it taking in Python compared to doing it not with Python? – Random Davis Jan 27 '22 at 16:24
  • I am currently working with a file which, if not compacted, is 300MB (in the future it could be even a bit bigger). Compacted it becomes about 60MB. With this Python function I compress everything in about 30s. With WinRAR it takes less than 10s. – El_Merendero Jan 27 '22 at 16:41
  • Do you have to do it in Python itself? Unless a Python library uses compiled native code in order to do the "heavy lifting", it's not going to be nearly as efficient as something compiled to native code. Can you not just call an external program via the command line, like 7-zip, WinRAR, or whatever, in order to do the zipping? The only reason I can imagine that you wouldn't be able to do that, is if you need this program to be useable on any platform. But if this is just something you're running for yourself on your own machine, there's no reason to not just call an external program, I think. – Random Davis Jan 27 '22 at 16:42
  • 1
    You could execute WinRAR from Python. – Mark Adler Jan 27 '22 at 16:42
  • 2
    zlib doesn't make zip files by itself. They are not "corrupt". They are a different format. – Mark Adler Jan 27 '22 at 16:43
  • https://stackoverflow.com/questions/16158648/faster-alternative-to-pythons-zipfile-module – Albert Aug 24 '23 at 20:32

1 Answers1

2

Try adding a compresslevel=3 or compresslevel=1 parameter to the zipfile.ZipFile() object creation, and see if that provides a more satisfactory speed and sufficient compression.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
  • This may help others: here are some notes of compresslevel's accepted values. ```compresslevel: None (default for the given compression type) or an integer specifying the level to pass to the compressor.``` ```When using ZIP_STORED or ZIP_LZMA this keyword has no effect.``` ```When using ZIP_DEFLATED integers 0 through 9 are accepted.``` ```When using ZIP_BZIP2 integers 1 through 9 are accepted.``` – beeeliu Oct 17 '22 at 22:34