I need to import a very large dictionary into python and I'm running into some unexpected memory bottlenecks. The dictionary has the form,
d = {(1,2,3):(1,2,3,4), (2,5,6)=(4,2,3,4,5,6), ... }
So each key is a 3-tuple and each value is a relatively small tuple of arbitrary size (probably never more than 30 elements). What makes the dictionary large is the number of keys. A smaller example of what I'm working with has roughly 247257 keys. I generate this dictionary through a simulation so I can write out a text file that defines this dictionary and for the example I just mentioned this is a 94MB file. The bottleneck I am running into is that the initial compile to python byte code eats up about 14GB of ram. So the first time I import the dictionary I see the RAM usage spike up and after a good 10 seconds everything is loaded. If the .pyc file is already generated the import is nearly instant. Using pympler, I've determined that this dictionary is only about 200 MB in memory. What is the deal here? Do I have any other options on how get this dictionary loaded into python or at least compiled to byte code. I'm running the generating simulations in C++ and I can't write files an whatever format I need. Are there any options there (python libraries, etc.)? I'm interfacing with some software that needs this data as a dictionary so please no other suggestions in that realm. Also just in case you are wondering, I have defined the dictionary in the text file like the definition above as well as like so,
d = {}
d[1,2,3] = (1,2,3,4)
d[2,5,6] = (4,2,3,4,5,6)
...
Both give the same memory spike in compile to byte code. In fact, the second one seems to be slightly worse, which is surprising to me. There's got to be some way to tame the amount of ram the initial compile needs. It seems like it should somehow be able to do the compile one key-value pair at a time. Any ideas?
Other info: using python 2.6.5