Compressing strings and appending to file on the fly

Question

I currently have the following csv writer class:

class csvwriter():
    writer = None
    writehandler = None
    @classmethod
    def open(cls,file):
        cls.writehandler = open(file,'wb')
        cls.writer = csv.writer(cls.writehandler, delimiter=',',quotechar='"', quoting=csv.QUOTE_NONNUMERIC)

    @classmethod
    def write(cls,arr):
        cls.writer.writerow(arr)

    @classmethod
    def close(cls):
        cls.writehandler.close()

which can generate proper csv files without ever having to store the full array in memory at a single time.

However, the files created through use of this code can be quite large, so I'm looking to compress them, rather than writing them uncompressed. (In order to save on disk usage). I can't effectively store the file in memory either, as I'm expecting files of well over 20gb to be a regular occurence.

The recipients of the resulting files are generally not sysadmins of their PCs, nor do they all use linux, so I'm constrained in the types of algorithms I'm able to use for this task. Preferably, the solution would use a compression scheme that's natively readable (no executables required) in Windows, OSX and any linux distribution.

I've found gzip provides a very handy interface in Python, but reading gzipped files in windows seems like quite a hassle.. Ideally I'd put them in a zip archive, but zip archive don't allow you to append data to files already present in the archive, which then forces me to store the whole file in memory, or write the data away to several smaller files that I would be able to fit in memory.

My question: Is there a solution that would benefit from the best of both worlds? Widespread availability of tools to read the target format on the end-user's machine, and also the ability to append, rather than write the whole file in one go?

Thanks in advance for your consideration!

why do you think you ever have to store all the data in memory to write it? also there is not really much point in using a class to do what you are doing — Padraic Cunningham, Apr 25 '15 at 16:27
Hi Padraic, The point about the class is fair enough. I noticed my code contained the gzip.open function by accident, I intended to give the naive version without compression. Gzip is usable, but difficult for non-linux users to decompress (requires executables, which my end-users might not necessarily have access to). — Erwin Haasnoot, Apr 25 '15 at 17:08
I've found the following blog post - http://swl10.blogspot.nl/2012/12/writing-stream-to-zipfile-in-python.html - I'll try and see if this is a fruitful path to follow. — Erwin Haasnoot, Apr 25 '15 at 17:10
There is a lib that works cross platform with zip files, if I can think of the name I will let you know. You could basically mimic what they are doing to work for you. — Padraic Cunningham, Apr 25 '15 at 17:31
http://stackoverflow.com/questions/297345/create-a-zip-file-from-a-generator-in-python seems to be the exact same question. I can't mark this a duplicate yet. — Erwin Haasnoot, Apr 25 '15 at 17:33
Thanks, Padraic. I'd love to know which library that would be, the proposed solution in the SO question I linked is not a very clean solution.. but the answer is workable. — Erwin Haasnoot, Apr 25 '15 at 17:35
I will have a root around later, it is a while back that I saw it while looking for something related. — Padraic Cunningham, Apr 25 '15 at 17:36
Seems like the solution writes to the zip archive just fine, there just doesn't seem to be any compression, which is obviously not ideal.. — Erwin Haasnoot, Apr 25 '15 at 18:41

score 2 · Answer 1 · answered Apr 25 '15 at 19:05

2

gzlog may provide the functionality you're looking for. It efficiently appends short strings to a gzip file, intended for applications where short messages are appended to a long log.

answered Apr 25 '15 at 19:05

Mark Adler

101,978
13
118
158

Hi Mark, Thanks for the suggestion! I'm trying to compress files intended for end-users who are likely to work on windows machines, and who don't have administrator rights to install software. For that reason, I don't think gzip will be a good idea, but please correct me if I'm wrong! – Erwin Haasnoot Apr 25 '15 at 19:52
1

You can use the same technique to append to a single-entry zip file. Both zip and gzip use the deflate compressed data format. – Mark Adler Apr 25 '15 at 22:25

Compressing strings and appending to file on the fly

1 Answers1