I'm using the following generator to iterate through a given csv file row by row in a memory efficient way:
def csvreader(file):
with open(file, 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter=',',quotechar='"')
for row in reader:
yield row`
This works perfectly and I am able to handle very large files incredibly well. A CSV file of several gigabytes seems to be no problem at all for a small virtual machine instance with limited RAM.
However, when files grow too large, disk space becomes a problem. CSV files generally seem to get very high compression rates, which allows me to store the files at a fraction of their uncompressed size, but before I can use the above code to handle the file, I have to decompress/inflate the file and then run it through my script.
My question: Is there any way to build an efficient generator that does the above (given a file, yield CSV rows as an array), but does so by inflating parts of the file, up till a newline is reached, and then running that through the csv reader, without ever having to deflate/decompress the file as a whole?
Thanks very much for your consideration!