6

I've read about Raymond Hettinger's new method of implementing compact dicts. This explains why dicts in Python 3.6 use less memory than dicts in Python 2.7-3.5. However there seems to be a difference between the memory used in Python 2.7 and 3.3-3.5 dicts. Test code:

import sys

d = {i: i for i in range(n)}
print(sys.getsizeof(d))
  • Python 2.7: 12568
  • Python 3.5: 6240
  • Python 3.6: 4704

As mentioned I understand the savings between 3.5 and 3.6 but am curious about the cause of the savings between 2.7 and 3.5.

PM 2Ring
  • 54,345
  • 6
  • 82
  • 182
Alex
  • 18,484
  • 8
  • 60
  • 80
  • Hm, never noticed this, nice find. I'm not sure if a change was made so plain dictionaries can benefit from the combined table form (see a Q&A I did [here](https://stackoverflow.com/questions/42419011/why-is-the-dict-of-instances-so-small-in-python-3) on instance dicts) but it might be worth investigating it. I doubt it though :-) – Dimitris Fasarakis Hilliard Jul 28 '17 at 18:21

1 Answers1

10

Turns out this is a red herring. The rules for increasing the size of dicts changed between cPython 2.7 - 3.2 and cPython 3.3 and again at cPython 3.4 (though this change only applies when deletions occur). We can see this using the following code to determine when the dict expands:

import sys

size_old = 0
for n in range(512):
    d = {i: i for i in range(n)}
    size = sys.getsizeof(d)
    if size != size_old:
        print(n, size_old, size)
    size_old = size

Python 2.7:

(0, 0, 280)
(6, 280, 1048)
(22, 1048, 3352)
(86, 3352, 12568)

Python 3.5

0 0 288
6 288 480
12 480 864
22 864 1632
44 1632 3168
86 3168 6240

Python 3.6:

0 0 240
6 240 368
11 368 648
22 648 1184
43 1184 2280
86 2280 4704

Keeping in mind that dicts resize when they get to be 2/3 full, we can see that the cPython 2.7 dict implementation quadruples in size when it expands while the cPython 3.5/3.6 dict implementations only double in size.

This is explained in a comment in the dict source code:

/* GROWTH_RATE. Growth rate upon hitting maximum load.
 * Currently set to used*2 + capacity/2.
 * This means that dicts double in size when growing without deletions,
 * but have more head room when the number of deletions is on a par with the
 * number of insertions.
 * Raising this to used*4 doubles memory consumption depending on the size of
 * the dictionary, but results in half the number of resizes, less effort to
 * resize.
 * GROWTH_RATE was set to used*4 up to version 3.2.
 * GROWTH_RATE was set to used*2 in version 3.3.0
 */
Alex
  • 18,484
  • 8
  • 60
  • 80
  • 1
    It'd be nice if Raymond Hettinger sees this question and tells us why they made that change... – PM 2Ring Jul 24 '17 at 16:36
  • 1
    @PM2Ring I've got the [next best thing](https://www.youtube.com/watch?v=p33CVV29OG8&feature=youtu.be&t=21m57s) – Alex Jul 24 '17 at 16:48