1

I'm using the following generator to iterate through a given csv file row by row in a memory efficient way:

def csvreader(file):
    with open(file, 'rb') as csvfile:
        reader = csv.reader(csvfile, delimiter=',',quotechar='"')
        for row in reader:
            yield row`

This works perfectly and I am able to handle very large files incredibly well. A CSV file of several gigabytes seems to be no problem at all for a small virtual machine instance with limited RAM.

However, when files grow too large, disk space becomes a problem. CSV files generally seem to get very high compression rates, which allows me to store the files at a fraction of their uncompressed size, but before I can use the above code to handle the file, I have to decompress/inflate the file and then run it through my script.

My question: Is there any way to build an efficient generator that does the above (given a file, yield CSV rows as an array), but does so by inflating parts of the file, up till a newline is reached, and then running that through the csv reader, without ever having to deflate/decompress the file as a whole?

Thanks very much for your consideration!

  • Ideally, I'd love to have a solution for the reverse as well. Given an array, encode it in a CSV compatible way, deflate and then append it to an existing file. Although I realise this might be harder to do, maybe there'd be some way to read the header of a compressed file and use that compression scheme to compress a given string? – Erwin Haasnoot Apr 23 '15 at 08:36
  • 1
    The best chance would be to open the file as a `GzipFile` ( https://docs.python.org/2/library/gzip.html ) and test the memory consumption. Please note that compression has a large impact on file I/O. – Klaus D. Apr 23 '15 at 08:39

2 Answers2

1

Try using gzip

Just replace with open(file, 'rb') as csvfile: with with gzip.open(file, 'rb') as csvfile: and add import gzip at the top of your script.

See this SO question for more

Community
  • 1
  • 1
mirosval
  • 6,671
  • 3
  • 32
  • 46
1

If you from gzip import open, you do not need to change your code at all!

mkrieger1
  • 19,194
  • 5
  • 54
  • 65