I intend to read a file, which is about 500MB in total, into a dict according to the key in each line. The code snippet is as follows:
f2 = open("ENST-NM-chr-name.txt", "r") # small amount
lines = [l.strip() for l in f2.readlines() if l.strip()]
sample = dict([(l.split("\t")[2].strip("\""), l) for l in lines]) ## convert [(1,2), (3,4)] to {1:2, 3:4}
When running on a machine with memory of 4GB, the python complains Memory Error. If I change the evaluation expression of sample
variable to [l for l in lines]
, it works fine.
At first, I thought it was due to the split
method that was consuming lots of memory, so I adjust my code to this:
def find_nth(haystack, needle, n):
start = haystack.find(needle)
while start >= 0 and n > 1:
start = haystack.find(needle, start+len(needle))
n -= 1
return start
...
sample = dict([(l[find_nth(l, "\t", 4):].strip(), l) for l in lines])
But it turns out the same.
A new discovery is that it will run normally without OOM provided I remove the dict()
conversion regardless of the code logic.
Could anyone give me some idea on this problem?