I'm reading a csv file which is about 1 GB but it winds up taking over 10 GB of memory. Since DictReader returns an iterator over dicts, each of which has the elements of the header string encoded as keys, I can imagine lines
taking up twice as much space (~1 GB) but ten times as much? This confuses me.
import csv
def readeverything(filename):
thefile = open(filename)
reader = csv.DictReader(thefile, delimiter='\t')
lines = []
for datum in reader:
lines.append(datum)
thefile.close()
return lines
The size of the raw string is actually smaller than the size of the parsed dict. I found this out using sys.getsizeof
on the first line in the file and on the first record read by csv.DictReader
. Therefore, the strict size of the dictionary does not account for the exponential explosion of memory usage when reading the CSV.