You're actually getting an incomplete picture of memory use in this case. The total size of a dictionary more than doubles at irregular intervals, and if you compare the size of these two structures right after the dictionary size has increased, it's bigger again. A simple script with a recursive size function (see code below) shows a pretty clear pattern:
i: 2 list size: 296 dict size: 328 difference: -32
i: 3 list size: 392 dict size: 352 difference: 40
i: 4 list size: 488 dict size: 376 difference: 112
i: 5 list size: 616 dict size: 400 difference: 216
i: 7 list size: 808 dict size: 1216 difference: -408
i: 10 list size: 1160 dict size: 1288 difference: -128
i: 13 list size: 1448 dict size: 1360 difference: 88
i: 17 list size: 1904 dict size: 1456 difference: 448
i: 23 list size: 2480 dict size: 3904 difference: -1424
i: 31 list size: 3328 dict size: 4096 difference: -768
i: 42 list size: 4472 dict size: 4360 difference: 112
i: 56 list size: 5912 dict size: 4696 difference: 1216
i: 74 list size: 7880 dict size: 5128 difference: 2752
i: 100 list size: 10520 dict size: 14968 difference: -4448
i: 133 list size: 14024 dict size: 15760 difference: -1736
i: 177 list size: 18672 dict size: 16816 difference: 1856
This pattern continues as i
grows. (You can test this using your method -- try setting i
near 2636744
. The size of the dictionary is larger at that point, at least for me.) Martijn is right that the tuples in the list of tuples add to the memory overhead, canceling out the memory advantage of lists over dictionaries. But the result, on average, is not that the dictionary is better; it's that the dictionary is about the same. So in answer to your original question:
When you want to store LOTS of key-value data in memory, which data structure is more memory-efficient, a dict or a list of tuples?
It doesn't really matter if all you're concerned about is memory.
However, note that iterating over a dictionary is often a bit slower than iterating over a list, because there's no good way to avoid iterating over all the empty bins in the dictionary. So there's a bit of a tradeoff -- dictionaries are (much) faster at doing random key lookups, but lists are (a bit) faster at iteration. The dictionary will probably be better most of the time, but in some rare cases, the list may provide a micro-optimization.
Here's the code that tests for size. It probably won't generate correct results for all corner cases, but it should handle simple structures like this without any problems. (But let me know if you see any issues.)
import sys, collections, itertools, math
def totalsize(x):
seen = set()
return ts_rec(x, seen)
def ts_rec(x, seen):
if id(x) in seen:
return 0
else:
seen.add(id(x))
x_size = sys.getsizeof(x)
if isinstance(x, collections.Mapping):
kv_chain = itertools.chain.from_iterable(x.iteritems())
return x_size + sum(ts_rec(i, seen) for i in kv_chain)
elif isinstance(x, collections.Sequence):
return x_size + sum(ts_rec(i, seen) for i in x)
else:
return x_size
for i in (10 ** (e / 8.0) for e in range(3, 19)):
i = int(i)
lsize = totalsize([(x, x) for x in xrange(i)])
dsize = totalsize(dict((x, x) for x in xrange(i)))
print "i: ", i,
print " list size: ", lsize, " dict size: ", dsize,
print " difference: ", lsize - dsize