In python dict
is really a list
with a hash function to generate indices.
The hash function is exposed by python:
>>> hash
<built-in function hash>
Note that the hash function was changed in python 3:
Python2.7.10
>>>>>> hash("foo")
-740391237
Python3.5.0
>>> hash("foo")
866150152168011056
When you type
mydict = {}
What really happens is that python will allocate an empty list with size 8.
When you now start adding items to the mydict
python will calculate the hash values of the items and consider the 3 least significant bits thereof to calculate its index in the list:
def bits(integer):
return "".join(str(x) for x in [1&(integer>>i) for i in range(32)[::-1]])
>>>for item in "myitem","hashfunction","python":
print(bits(hash(item))[-3:])
101
100
000
So a dict with these keys will have a different order than you expect:
>>> mydict={}
>>> mydict["myitem"]=None
>>> mydict["hashfunction"]=None
>>> mydict["python"]=None
>>> print mydict
{'python': None, 'hashfunction': None, 'myitem': None}
They're the order of the last three digits of the hash in the dict.
As the dict gets fuller, python will reallocate it and use a different hash, for small dictionaries ( up to 128k) it will quadruple its size, for larger dicts it will *double its size**. This reallocation happens when the dict gets 2/3 full.
>>> keys=["myitem","hashfunction","python","in","a","super","large","dict"]
>>> for item in keys:
print(item, bits(hash(item))[-5:])
('myitem', '01101')
('hashfunction', '00100')
('python', '01000')
('in', '10111')
('a', '00000')
('super', '11100')
('large', '10000')
('dict', '10100')
>>>mydict={key:None for key in keys}
>>>print mydict
{'a': None, 'hashfunction': None, 'python': None,
'myitem': None, 'large': None, 'dict': None, 'in': None, 'super': None}
This means that the order in a dict
will change while you're putting in more items, sometimes radically.
Note that a dict will only ever increase its size, never shrink as you del
items from it.
To know more about dict
and how it handles hash collisions I recommend Brandon Rhodes' excellent Pycon2010 talk on the inner workings of dict
: The mighty dictionary
Bottom line is that in a dict
you should never rely on its order.
Raymond Hettinger implemented an OrderedDict
class in the collections
module.
from collections import OrderedDict
d = OrderedDict()
for num in range(5):
d[num] = num
print d #OrderedDict([(0, 0), (1, 1), (2, 2), (3, 3), (4, 4)])
It inherits from dict
but wraps around some code to remember the order in which the keys were added. You can rely on the ordering of OrderedDict
.