Python dictionary memory usage

Question

I've been working on a project that involves loading a relatively large dictionary into memory from a file. The dictionary has just under 2 million entries, each entry (key and value combined) is under 20 bytes. The size of the file on disk is 38 MB.

My problem is that when I try to load the dictionary, my program immediately expands to over 2.5 gigabytes of memory used.

Here is the code I use to read the dictionary in from disk:

f = open('someFile.txt', 'r')
rT = eval(f.read())
f.close()

Have you tried [pickle](http://docs.python.org/library/pickle.html)? — MattH, May 07 '11 at 21:41

score 7 · Accepted Answer · edited Mar 25 '12 at 02:15

7

I think the memory is used to parse the dictionary syntax AST.

For this kind of use it's much better if you go for the cPickle module instead of using repr/eval.

import cPickle

x = {}
for i in xrange(1000000):
    x["k%i" % i] = "v%i" % i
cPickle.dump(x, open("data", "wb"), -1)

x = cPickle.load(open("data", "rb"))

-1 when dumping means using latest protocol that is more efficient but possibly not backward compatible with older python versions. If this is a good idea or not depends on why you need to dump/load.

edited Mar 25 '12 at 02:15

outis

75,655
22
151
221

answered May 07 '11 at 21:42

6502

112,025
15
165
265

you might also want to use the json module – Winston Ewert May 07 '11 at 21:50
Shelve is a good alternative too. It's designed for huge dictionaries which may be partially stored on disk. – Nathan May 08 '11 at 00:20
Thanks! I haven't had a chance to implement this yet, but I read up on pickle a little bit; it seems like that should fix the problem. – dckrooney May 08 '11 at 17:53
I ended up using cPickle, which worked perfectly... Memory footprint is down to a more reasonable level, and the dictionary loads MUCH faster. Thanks! – dckrooney May 09 '11 at 02:54

score 0 · Answer 2 · edited May 23 '17 at 11:44

0

This may be a bit off-topic, but it can also helps tremendously using generator expressions when working with big files/streams of data.

This discussion explains it very well and this presentation changed the way I wrote my programs.

edited May 23 '17 at 11:44

Community

1
1

answered Mar 12 '12 at 23:42

Morten Jensen

5,818
3
43
55

Python dictionary memory usage

2 Answers2