I think you are referring to the following paragraph of the PEP (in the Split-Table dictionaries section):
When resizing a split dictionary it is converted to a combined table. If resizing is as a result of storing an instance attribute, and there is only instance of a class, then the dictionary will be re-split immediately. Since most OO code will set attributes in the __init__
method, all attributes will be set before a second instance is created and no more resizing will be necessary as all further instance dictionaries will have the correct size.
So a dictionary keys will remain shared, no matter what additions are made, before a second instance can be created. Doing so in __init__
is the most logical method of achieving this.
This does not mean that attributes set at a later time are not shared; they can still be shared between instances; so long as you don't cause any of the dictionaries to be combined. So after you create a second instance, the keys stop being shared only if any of the following happens:
- a new attribute causes the dictionary to be resized.
- a new attribute is not a string attribute (dictionaries are highly optimised for the common all-keys-are-strings case).
- an attribute is inserted in a different order; for example
a.foo = None
is set first, and then second instance b
sets b.bar = None
first, here b
has an incompatible insertion order, as the shared dictionary has foo
first.
- an attribute is deleted. This kills sharing even for one instance. Don't delete attributes if you care about shared dictionaries.
Python 3.11 improved shared-key dictionaries considerably, however. The values for the shared dictionary are inlined into an array as part of the instance, as long as there are fewer than 30 unique attributes (across all instances of a class), and deleting an attribute or inserting keys in a different order no longer affect dictionary key sharing.
So the moment you have two instances (and two dictionaries sharing keys), the keys won't be re-split as long as you don't trigger any of the above cases, your instances will continue to share keys.
It also means that delegating setting attributes to a helper method called from __init__
is not going to affect the above scenario, those attributes are still set before a second instance is created. After all __init__
won't be able to return yet before that second method has returned.
In other words, you should not worry too much about where you set your attributes. Setting them in the __init__
method lets you avoid combining scenarios more easily, but any attribute set before a second instance is created is guaranteed to be part of the shared keys.
There is no good way to detect if a dictionary is split or combined from Python, not reliably across versions. We can, however, access the C implementation details by using the ctypes
module. Given a pointer to a dictionary and the C header definition of a dictionary, you can test if the ma_values
field is NULL
. If not, it is a shared dictionary:
import ctypes
class PyObject(ctypes.Structure):
"""Python object header"""
_fields_ = [
("ob_refcnt", ctypes.c_ssize_t),
("ob_type", ctypes.c_void_p), # PyTypeObject*
]
class PyDictObject(ctypes.Structure):
"""A dictionary object."""
_fields_ = [
("ob_base", PyObject),
("ma_used", ctypes.c_ssize_t),
("ma_version_tag", ctypes.c_uint64),
("ma_keys", ctypes.c_void_p), # PyDictKeysObject*
("ma_values", ctypes.c_void_p), # PyObject** or PyDictValues*
]
Py_TPFLAGS_MANAGED_DICT = 1 << 4
def has_inlined_attributes(obj):
"""Test if an instance has inlined attributes (Python 3.11)"""
if not type(obj).__flags__ & Py_TPFLAGS_MANAGED_DICT:
return False
# the (inlined) values pointer is stored in the pre-header at offset -4
# (-3 is the dict pointer, remainder is the GC header)
return bool(ctypes.cast(id(a), ctypes.POINTER(ctypes.c_void_p))[-4])
def is_shared(d):
"""Test if the __dict__ of an instance is a PEP 412 shared dictionary"""
# Python 3.11 inlines the (shared dictionary) values as an array, unless you
# access __dict__. Don't clobber the inlined values.
if has_inlined_attributes(d):
return True
cdict = ctypes.cast(id(d.__dict__), ctypes.POINTER(PyDictObject)).contents
# if the ma_values pointer is not null, it's a shared dictionary
return bool(cdict.ma_values)
A quick demo (using Python 3.10):
>>> class Foo:
... pass
...
>>> a, b = Foo(), Foo() # two instances
>>> is_shared(a), is_shared(b) # they both share the keys
(True, True)
>>> a.bar = 'baz' # adding a single key
>>> is_shared(a), is_shared(b) # no change, the keys are still shared!
(True, True)
>>> a.spam, a.ham, a.monty, a.eric = (
... 'eggs', 'eggs and spam', 'python',
... 'idle') # more keys still
>>> is_shared(a), is_shared(b) # no change, the keys are still shared!
(True, True)
>>> a.holy, a.bunny, a.life = (
... 'grail', 'of caerbannog',
... 'of brian') # more keys, resize time
>>> is_shared(a), is_shared(b) # oops, we killed it
(False, True)
Only when the threshold was reached (for an empty dictionary with 8 spare slots, the resize takes place when you add a 6th key), did the dictionary for instance a
loose the shared property. (Later Python releases may push that resize point out further).
Dictionaries are resized when they are about 2/3rd full, and a resize generally doubles the table size. So the next resize will take place when the 11th key is added, then at 22, then 43, etc. So for a large instance dictionary, you have a lot more breathing room.
For Python 3.11, it takes a little longer still before is_shared()
returns False
; you need to insert 30 attributes:
>>> import sys, secrets
>>> sys.version_info
sys.version_info(major=3, minor=11, micro=0, releaselevel='final', serial=0)
>>> class Foo: pass
...
>>> a = Foo()
>>> count = 0
>>> while is_shared(a):
... count += 1
... setattr(a, secrets.token_urlsafe(), 42)
...
>>> count
30