I am writing a very simple script that will count the number of occurence in a file. The file size is about 300Mb (15 million lines) and has 3 columns. Since I am reading the file line by line I don't expect python to use much memory. Maximum would be slightly above 300Mb to store the count dictionnary.
However when I look at activity monitor the memory usage go above 1.5Gb. What am I doing wrong ? If it is normal, could someone explain please? Thanks
import csv
def get_counts(filepath):
with open(filepath,'rb') as csvfile:
reader = csv.DictReader(csvfile, fieldnames=['col1','col2','col3'], delimiter=',')
counts = {}
for row in reader:
key1 = int(row['col1'])
key2 = int(row['col2'])
if (key1, key2) in counts:
counts[key1, key2] += 1
else:
counts[key1, key2] = 1
return counts