2

I don't understand this and it's going to bother me until I do.

This python code counts the number of times each character appears in the 'message' variable:

message = 'Some random string of words'

dictionary= {}

for character in message.upper():
    dictionary.setdefault(character,0)
    dictionary[character] = dictionary[character] + 1

print(dictionary)

If you run this multiple times, you will notice the counts are returned in seemingly random order each time. Why is this? I would think that the loop should start at the beginning of the character string each time and return the values in a consistent order...but they don't. Is there some element of randomness in the setdefault(), print(), or upper() methods that impacts the order of processing of the string?

pzp
  • 6,249
  • 1
  • 26
  • 38
DCaugs
  • 484
  • 5
  • 16
  • Dictionary is **set** of key-value pairs. Not a list. A set. And sets have no order. – UltraInstinct Aug 05 '16 at 01:54
  • http://stackoverflow.com/questions/1867861/python-dictionary-keep-keys-values-in-same-order-as-declared – Abdou Aug 05 '16 at 01:54
  • @SuperSaiyan - thank you for the feedback. I understand that Dictionaries are NOT ordered, I'm more trying to understand the WHY. It just seems counter intuitive to me that the same basic code would return values in a random order...I'm curious as to the internals of how that happens. – DCaugs Aug 05 '16 at 02:06
  • 1
    @DCaugs Added some code that satisfies your desired goal to my answer. Hope you find it useful. – pzp Aug 05 '16 at 16:24

5 Answers5

5

Because of two things:

  • Dictionaries "aren't ordered". You of course get some order, but it depends, among other things, on the hash values of the keys.
  • You use (single-character) strings as keys, and string hashes are randomized. If you do print(hash(message)) or even just print(hash('c')) then you'll see that that differs from one run to the next as well.

So since the order depends on the hashes and the hashes change from one run to the next, of course you can get different orders.

On the other hand, if you repeat it in the same run, you'll likely get the same order:

message = 'Some random string of words'
for _ in range(10):
    dictionary= {}
    for character in message:
        dictionary.setdefault(character,0)
        dictionary[character] = dictionary[character] + 1
    print(dictionary)

I just ran that and it printed the exact same order all ten times, as expected. Then I ran it again, and it printed a different order, but again all ten times the same. As expected.

Stefan Pochmann
  • 27,593
  • 8
  • 44
  • 107
  • Ah - of course...that makes perfect sense! I was missing the hash element of the scenario...Thank you! – DCaugs Aug 05 '16 at 02:27
  • 1
    @DCaugs is more obvious in other languages, which explicitly call their `dictionary` equivalent a `HashMap`. – RoadieRich Aug 05 '16 at 17:47
2

dicts are inherently unordered.

From the Python docs:

Keys and values are iterated over in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary’s history of insertions and deletions.

EDIT

An alternative to your code that correctly accomplishes your goal is to use an OrderedCounter:

from collections import Counter, OrderedDict

class OrderedCounter(Counter, OrderedDict):
    'Counter that remembers the order elements are first encountered'

    def __repr__(self):
        return '%s(%r)' % (self.__class__.__name__, OrderedDict(self))

    def __reduce__(self):
        return self.__class__, (OrderedDict(self),)

message = 'Some random string of words'
print(OrderedCounter(message.upper()))
pzp
  • 6,249
  • 1
  • 26
  • 38
  • 1
    This note does not explain why order changes between multiple runs on the same implementation and with the same history of insertions and deletions. (also, it is random between interpreter runs, but not within a single run) – viraptor Aug 05 '16 at 02:18
  • @viraptor I don't recall that being the OP's question. The OP simply asked why the dictionary is getting printed in a different order each time he/she runs the program, and that is the question I answered. – pzp Aug 05 '16 at 14:18
  • "If you run this multiple times, you will notice the counts are returned in seemingly random order each time. Why is this?" -> the quoted doc does not explain this. Repeat the original code multiple times in a same run and the order will be arbitrary, but always the same. – viraptor Aug 07 '16 at 10:47
2

This happens due to security. When you're writing any application where external user can provide data which ends up in a dictionary, you need to make sure they don't know what the result of hashing will be. If they do, they can make sure that every new entry they provide will hash to the same bin. When they do that, you end up with your "amortized O(1)" retrievals taking O(n) instead, because every get() from a dictionary will get the same bin and will have to traverse all items in it. (or possibly longer considering other processing of the request)

Have a look at https://131002.net/siphash/siphashdos_appsec12_slides.pdf for some more info.

Almost all languages prevent this by generating a random number at startup and using that as the hash seed, rather than starting from some predefined number like 0.

viraptor
  • 33,322
  • 10
  • 107
  • 191
  • Excellent - thank you. This makes a lot of sense in addition to Stefan's answer above. I really hadn't considered the security angle here. – DCaugs Aug 05 '16 at 02:31
1

The way that dict is implemented is designed for look ups to be quick and efficient. Even as the size of the dict increases. Under the hood this means that the key order may change.

If the order of the keys is important to you, try using an ordereddict from collections.

Batman
  • 8,571
  • 7
  • 41
  • 80
  • This makes sense, but then why does the order change when the size of the dict does not? That's what I'm trying to wrap my head around - if you execute the same simple code over and over, why does Python decide to return the results in different order when literally nothing is changing? – DCaugs Aug 05 '16 at 02:08
0

Since Python 3.7 dictionaries are now insertion ordered (documentation)

Dictionaries preserve insertion order. Note that updating a key does not affect the order. Keys added after deletion are inserted at the end.

So the expected behavior you were expecting in the question now is the actual behavior.

J Agustin Barrachina
  • 3,501
  • 1
  • 32
  • 52