Python use hash table for storing the dictionaries so there is no ordered in dictionaries or other objects that use hash function.
But about the indices of items in a hash object, python calculate the indices based on following code within hashtable.c
:
key_hash = ht->hash_func(key);
index = key_hash & (ht->num_buckets - 1);
So as the hash value of integers is the integer itself the index is based on the number (ht->num_buckets - 1
is a constant) so the index calculated by Bitwise-and between (ht->num_buckets - 1)
and the number.
consider the following example with set
that use hash-table :
>>> set([0,1919,2000,3,45,33,333,5])
set([0, 33, 3, 5, 45, 333, 2000, 1919])
For number 33
we have :
33 & (ht->num_buckets - 1) = 1
That actually it's :
'0b100001' & '0b111'= '0b1' # 1 the index of 33
Note in this case (ht->num_buckets - 1)
is 8-1=7
or 0b111
.
And for 1919
:
'0b11101111111' & '0b111' = '0b111' # 7 the index of 1919
And for 333
:
'0b101001101' & '0b111' = '0b101' # 5 the index of 333
For more details about python hash function its good to read the following quotes from python source code :
Major subtleties ahead: Most hash schemes depend on having a "good" hash
function, in the sense of simulating randomness. Python doesn't: its most
important hash functions (for strings and ints) are very regular in common
cases:
>>> map(hash, (0, 1, 2, 3))
[0, 1, 2, 3]
>>> map(hash, ("namea", "nameb", "namec", "named"))
[-1658398457, -1658398460, -1658398459, -1658398462]
This isn't necessarily bad! To the contrary, in a table of size 2**i, taking
the low-order i bits as the initial table index is extremely fast, and there
are no collisions at all for dicts indexed by a contiguous range of ints.
The same is approximately true when keys are "consecutive" strings. So this
gives better-than-random behavior in common cases, and that's very desirable.
OTOH, when collisions occur, the tendency to fill contiguous slices of the
hash table makes a good collision resolution strategy crucial. Taking only
the last i bits of the hash code is also vulnerable: for example, consider
the list [i << 16 for i in range(20000)]
as a set of keys. Since ints are their own hash codes, and this fits in a dict of size 2**15, the last 15 bits of every hash code are all 0: they all map to the same table index.
But catering to unusual cases should not slow the usual ones, so we just take
the last i bits anyway. It's up to collision resolution to do the rest. If
we usually find the key we're looking for on the first try (and, it turns
out, we usually do -- the table load factor is kept under 2/3, so the odds
are solidly in our favor), then it makes best sense to keep the initial index
computation dirt cheap.