1

I have a question about dictionary properties in python when the keys are number. in my case when I print a dictionary with number keys the result of print will be sorted by keys but in the other case (keys are string) dictionary is unordered. I want to know about this rule in dictionaries.

l = {"one" : "1", "two" : "2", "three" : "3"}

print(l)

l = {1: "one", 2: "two", 3: "three", 4: "four", 5: "five"}

print(l)

l = {2: "two", 3: "three", 4: "four", 1: "one", 5: "five"}

print(l)

result:

{'three': '3', 'two': '2', 'one': '1'}

{1: 'one', 2: 'two', 3: 'three', 4: 'four', 5: 'five'}

{1: 'one', 2: 'two', 3: 'three', 4: 'four', 5: 'five'}
Cœur
  • 37,241
  • 25
  • 195
  • 267
  • Python dictionaries are intrinsically unsorted. You can't count on their order and it won't be preserved. – kylieCatt May 29 '15 at 14:43
  • The reason your integer key dictionaries appear in the same order probably has to do with how Python caches small numbers. Print this dict `{500000: 'five', 400000: 'four', 30000: 'three', 200000000: 'two', 10: 'one'}` and you'll see the numerical order is no longer preserved. – kylieCatt May 29 '15 at 14:45
  • thanks , i get my answer – Faraz Molaee May 29 '15 at 16:19

1 Answers1

1

Python use hash table for storing the dictionaries so there is no ordered in dictionaries or other objects that use hash function.

But about the indices of items in a hash object, python calculate the indices based on following code within hashtable.c:

key_hash = ht->hash_func(key);
index = key_hash & (ht->num_buckets - 1);

So as the hash value of integers is the integer itself the index is based on the number (ht->num_buckets - 1 is a constant) so the index calculated by Bitwise-and between (ht->num_buckets - 1) and the number.

consider the following example with set that use hash-table :

>>> set([0,1919,2000,3,45,33,333,5])
set([0, 33, 3, 5, 45, 333, 2000, 1919])

For number 33 we have :

33 & (ht->num_buckets - 1) = 1

That actually it's :

'0b100001' & '0b111'= '0b1' # 1 the index of 33

Note in this case (ht->num_buckets - 1) is 8-1=7 or 0b111.

And for 1919 :

'0b11101111111' & '0b111' = '0b111' # 7 the index of 1919

And for 333 :

'0b101001101' & '0b111' = '0b101' # 5 the index of 333

For more details about python hash function its good to read the following quotes from python source code :

Major subtleties ahead: Most hash schemes depend on having a "good" hash function, in the sense of simulating randomness. Python doesn't: its most important hash functions (for strings and ints) are very regular in common cases:

>>> map(hash, (0, 1, 2, 3))
  [0, 1, 2, 3]
>>> map(hash, ("namea", "nameb", "namec", "named"))
  [-1658398457, -1658398460, -1658398459, -1658398462]

This isn't necessarily bad! To the contrary, in a table of size 2**i, taking the low-order i bits as the initial table index is extremely fast, and there are no collisions at all for dicts indexed by a contiguous range of ints. The same is approximately true when keys are "consecutive" strings. So this gives better-than-random behavior in common cases, and that's very desirable.

OTOH, when collisions occur, the tendency to fill contiguous slices of the hash table makes a good collision resolution strategy crucial. Taking only the last i bits of the hash code is also vulnerable: for example, consider the list [i << 16 for i in range(20000)] as a set of keys. Since ints are their own hash codes, and this fits in a dict of size 2**15, the last 15 bits of every hash code are all 0: they all map to the same table index.

But catering to unusual cases should not slow the usual ones, so we just take the last i bits anyway. It's up to collision resolution to do the rest. If we usually find the key we're looking for on the first try (and, it turns out, we usually do -- the table load factor is kept under 2/3, so the odds are solidly in our favor), then it makes best sense to keep the initial index computation dirt cheap.

Mazdak
  • 105,000
  • 18
  • 159
  • 188