I currently have the following csv writer class:
class csvwriter():
writer = None
writehandler = None
@classmethod
def open(cls,file):
cls.writehandler = open(file,'wb')
cls.writer = csv.writer(cls.writehandler, delimiter=',',quotechar='"', quoting=csv.QUOTE_NONNUMERIC)
@classmethod
def write(cls,arr):
cls.writer.writerow(arr)
@classmethod
def close(cls):
cls.writehandler.close()
which can generate proper csv files without ever having to store the full array in memory at a single time.
However, the files created through use of this code can be quite large, so I'm looking to compress them, rather than writing them uncompressed. (In order to save on disk usage). I can't effectively store the file in memory either, as I'm expecting files of well over 20gb to be a regular occurence.
The recipients of the resulting files are generally not sysadmins of their PCs, nor do they all use linux, so I'm constrained in the types of algorithms I'm able to use for this task. Preferably, the solution would use a compression scheme that's natively readable (no executables required) in Windows, OSX and any linux distribution.
I've found gzip provides a very handy interface in Python, but reading gzipped files in windows seems like quite a hassle.. Ideally I'd put them in a zip archive, but zip archive don't allow you to append data to files already present in the archive, which then forces me to store the whole file in memory, or write the data away to several smaller files that I would be able to fit in memory.
My question: Is there a solution that would benefit from the best of both worlds? Widespread availability of tools to read the target format on the end-user's machine, and also the ability to append, rather than write the whole file in one go?
Thanks in advance for your consideration!