Save memory overhead with DictReader

Question

I use csv.DictReader to read data from a CSV file. When the reader is iterated over, it yields dictionaries with keys taken from the CSV header and the values per row:

with open(filename) as h:
    data = csv.DictReader(h)
    for row in data:
        # row is dict

Each row is a dictionary with keys, and each row has exactly the same keys. In case when the values are integers and the keys (strings) are long, the keys occupy more memory space than the values..

Can I iterate the rows in the way that the keys of each row point to the same instance of keys, so I save memory space per row?

Note that I don't know the keys in advance - they are taken from CSV header. Otherwise I could use namedtuple or __slots__

Martijn Pieters · Accepted Answer · 2013-05-27T09:25:17.543

You can use a namedtuple; build it from the first row yourself:

with open(filename, 'rb') as h:
    data = csv.reader(h)
    headers = next(data)
    RowTuple = namedtuple('RowTuple', headers)
    for row in data:
        row = RowTuple(row)

This is essentially what DictReader() does; take the keys from the first row.

Note that the DictReader() code creates the dictionary with dict(zip(self.fieldnames, row)); this reuses the same strings for each row, and the only memory overhead you have is the dict table itself plus the hash values for the keys (these are recalculated each time and cached). The strings for the keys are not created anew for each row. The namedtuple approach doesn't either, but doesn't need to keep hashes since __slots__ are used.

Save memory overhead with DictReader

1 Answers1