1

First of all, I know how to deal with this situation. I am not asking how I can make an OrderedDict.

I am asking why the same dictionary created same way can have different order of keys? What is the logic behind this? Isn't it the exact same code running time-independent and environment-independent? What changes and makes python create a different result?

dimo414
  • 47,227
  • 18
  • 148
  • 244
previous_developer
  • 10,579
  • 6
  • 41
  • 66
  • 2
    Python dictionaries, as well as the mapping types in many other languages, are implemented using [hash tables](https://en.wikipedia.org/wiki/Hash_table). If you understand how they work, you will then understand why dictionaries do not have a useful ordering. – Colonel Thirty Two Jan 18 '16 at 16:31
  • The hash function in Python is quite well defined (here is one explanation http://www.laurentluce.com/posts/python-dictionary-implementation/). The resulting values from the hash function may vary depending on the underlying hardware (32/64 bit) or possibly other nuances (endian-ness). I would expect the order to be the same for the same code running on the same machine using the same version of Python. However, I would never rely on that. – Gordon Linoff Jan 18 '16 at 16:33
  • 2
    @Reti43, ...though the accepted answer there isn't yet accurate for versions of Python implementing hash randomization. – Charles Duffy Jan 18 '16 at 16:39
  • It's related, but they're not the same question. That is asking "why is the ordering arbitrary", this asking "why is it not consistent (between runs)". – dimo414 Jan 18 '16 at 16:53

2 Answers2

4

This behavior is detailed in object.__hash__()'s specification; it's to prevent certain types of malicious input from breaking applications:

Note By default, the __hash__() values of str, bytes and datetime objects are “salted” with an unpredictable random value. Although they remain constant within an individual Python process, they are not predictable between repeated invocations of Python.

This is intended to provide protection against a denial-of-service caused by carefully-chosen inputs that exploit the worst case performance of a dict insertion, O(n^2) complexity. See http://www.ocert.org/advisories/ocert-2011-003.html for details.

Before Python 3.3 this wasn't the case, and a dictionary would have the same order between different runs of the same application.

I answered a related question about disabling this behavior, which links to some of the relevant source code.

Community
  • 1
  • 1
dimo414
  • 47,227
  • 18
  • 148
  • 244
  • I think the question is more about why things don't hash the same way each time, end up in the same table slots, and thus end up in the same order, even if that order isn't user-specified / clearly-sorted / etc. That's a more interesting one. :) – Charles Duffy Jan 18 '16 at 16:29
  • You're right; corrected :) – dimo414 Jan 18 '16 at 16:36
1

Since Python 3.3, hash randomization is enabled by default for security reasons.

The concern is that an attacker can feed a specially crafted program input that led to many hash collisions. This will cause the dictionary to perform at worse case scenario and may effectively cause a denial of service on a system.

Many other languages that are often used for web programming also implemented hash randomization in roughly the same timeframe with their respective hash maps.

Prior to hash randomization being implemented, due to collision resolution and internal hash table resizing you may not end always up with the same order if you use different sequence of inserts and removes that produces the same set of final keys, but you'd usually get the same ordering in a dictionary when you insert and remove keys in the same sequence. However, this was an accident of implementation, which was never guaranteed by the language specification.

Lie Ryan
  • 62,238
  • 13
  • 100
  • 144