Reading a compressed/deflated (csv) file line by line

Question

I'm using the following generator to iterate through a given csv file row by row in a memory efficient way:

def csvreader(file):
    with open(file, 'rb') as csvfile:
        reader = csv.reader(csvfile, delimiter=',',quotechar='"')
        for row in reader:
            yield row`

This works perfectly and I am able to handle very large files incredibly well. A CSV file of several gigabytes seems to be no problem at all for a small virtual machine instance with limited RAM.

However, when files grow too large, disk space becomes a problem. CSV files generally seem to get very high compression rates, which allows me to store the files at a fraction of their uncompressed size, but before I can use the above code to handle the file, I have to decompress/inflate the file and then run it through my script.

My question: Is there any way to build an efficient generator that does the above (given a file, yield CSV rows as an array), but does so by inflating parts of the file, up till a newline is reached, and then running that through the csv reader, without ever having to deflate/decompress the file as a whole?

Thanks very much for your consideration!

Ideally, I'd love to have a solution for the reverse as well. Given an array, encode it in a CSV compatible way, deflate and then append it to an existing file. Although I realise this might be harder to do, maybe there'd be some way to read the header of a compressed file and use that compression scheme to compress a given string? — Erwin Haasnoot, Apr 23 '15 at 08:36
The best chance would be to open the file as a `GzipFile` ( https://docs.python.org/2/library/gzip.html ) and test the memory consumption. Please note that compression has a large impact on file I/O. — Klaus D., Apr 23 '15 at 08:39

score 1 · Accepted Answer · edited May 23 '17 at 11:58

1

Try using gzip

Just replace with open(file, 'rb') as csvfile: with with gzip.open(file, 'rb') as csvfile: and add import gzip at the top of your script.

See this SO question for more

edited May 23 '17 at 11:58

Community

1
1

answered Apr 23 '15 at 09:07

mirosval

6,671
3
32
46

score 1 · Answer 2 · answered Apr 23 '15 at 09:17

1

If you from gzip import open, you do not need to change your code at all!

answered Apr 23 '15 at 09:17

mkrieger1

19,194
5
54
65

Reading a compressed/deflated (csv) file line by line

2 Answers2