0

Context of the question

This is more a curiosity about how the Python interpreter works than a real issue (though I hope it is an acceptable question anyway).

In my specific case (maybe it doesn't matter, I don't know) I'm running the 3.5 intepreter (32 bit) on windows 7 (64 bit).

Running code

This is the simple code that I'm running from the Python 3.5 interpreter.

counter_example = {}
counter_example['one'] = 1
counter_example['two'] = 2
counter_example['three'] = 3
for currkey, currvalue in counter_example.items():
   print ('%s - %s' % (currkey, currvalue))

I start an interpreter windows A and I run this code more times (let's say 3-4 times), then I start a second python interpreter window and again I run the code 3-4 times and finally the same thing with a third intepreter window C.

The output

I've noticed that - if I execute this more times on the same interpreter window, I get always the same output, meaning in the same order.

Python 3.5.1 (v3.5.1:37a07cee5969, Dec  6 2015, 01:38:48) [MSC v.1900 32 bit (In
tel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> counter_example = {}
>>> counter_example['one'] = 1
>>> counter_example['two'] = 2
>>> counter_example['three'] = 3
>>> for currkey, currvalue in counter_example.items():
...    print ('%s - %s' % (currkey, currvalue))
...
two - 2
one - 1
three - 3
>>> counter_example = {}
>>> counter_example['one'] = 1
>>> counter_example['two'] = 2
>>> counter_example['three'] = 3
>>> for currkey, currvalue in counter_example.items():
...    print ('%s - %s' % (currkey, currvalue))
...
two - 2
one - 1
three - 3
>>>

On a different command-window the output will be in a different order but - surprisingly IMHO - the order of the items will be kept if I rerun the test there, starting again from the dictionary declaration. Why does this happen? What is happening behind the scenes?

Please note

I'm aware that I can do either this

for currkey, currvalue in sorted(counter_example.items()):

or that

from collections import OrderedDict
counter_example = OrderedDict()

But that's not what I'm asking for.

  • Why would the order be different? The hash randomisation only runs once, when you start the interpreter, so if you put the same items (with the same hashes) in the dictionary in the same order you'll get the same result, as they'll end up in the same buckets. When you restart the interpreter, you get a new ordering, but it's expected to be consistent within run (otherwise you'd never find anything in a dictionary again!) – jonrsharpe Jun 21 '16 at 08:41
  • 1
    The duplicate explains how the order is produced (it is an implementation detail). The randomisation-per-interpreter-invocation is covered by the paragraph *Note that as of Python 3.3, a random hash seed is used as well, making hash collisions unpredictable to prevent certain types of denial of service (where an attacker renders a Python server unresponsive by causing mass hash collisions). This means that the order of a given dictionary is then also dependent on the random hash seed for the current Python invocation.* – Martijn Pieters Jun 21 '16 at 08:41
  • Thanks @MartijnPieters. I'm not sure to understand a detail: why in Python 3.5 the order is the *same* when I rerun all the code? It is written that "This means that the order of a given dictionary is then also dependent on the random hash seed for the current Python invocation", so my understanding is that it should change because the *current* Python invocation does change when I rerun. Where am I going wrong? –  Jun 21 '16 at 08:46
  • Thanks @jonrsharpe What exactly is expected to be consistent within run? If I *declare* the dictionary again, why should the hash randomization be the same? I fail to understand this point –  Jun 21 '16 at 08:49
  • @python3.5_32bit_win7_64bit: either you have disabled hash randomisation (see [`PYTHONHASHSEED`](https://docs.python.org/3/using/cmdline.html#envvar-PYTHONHASHSEED)), or you are not testing with `str`, `bytes` or `datetime` keys, or you managed to get lucky and were given a random hash seed that just happened to produce the same order for your test input. Chance is funny that way. – Martijn Pieters Jun 21 '16 at 08:51
  • As for why the hash seed is set per interpreter invocation: because that suffices to thwart the attack. Producing a random hash seed for each dictionary is overkill and would complicate the `object.__hash__` method. With one hash seed per interpreter you can leave applying that seed to the `str.__hash__`, `bytes.__hash__` and `datetime.__hash__` methods without having to know about the dictionary they are being used in. – Martijn Pieters Jun 21 '16 at 08:55
  • Oh... maybe "current Python invocation" means the current Python interpreter window and it is not the current line of code executed. So that would explain the behaviour. –  Jun 21 '16 at 08:55
  • Yes, that's right - if you got a different hash for every line, dictionaries (and therefore Python) wouldn't work at all, as every time you tried to find a key you'd be looking in the wrong place. The same object must always have the same hash within one invocation of the interpreter. – jonrsharpe Jun 21 '16 at 08:58
  • @jonrsharpe we are saying 2 different things. Of course the *same* object must have the same hash. No doubt and no question about that. But if I declare it again it is *not* the same object. Anyway I think that Martijn Pieters has answered here: http://stackoverflow.com/questions/37939442/python-insights-how-the-interpreter-is-ordering-a-non-ordered-dictionary?noredirect=1#comment63328597_37939442. For performance reasons , the hash is kept in the same interpreter-window –  Jun 21 '16 at 09:02
  • Also things that are considered equal should generally have the same hash - however I create `'one'` I expect that `d['one']` will get back the right item! – jonrsharpe Jun 21 '16 at 09:09
  • @jonrsharpe: you could attach a random hash to each set or dict. But then you'd have to pass that as a parameter to `object.__hash__` each time, making it impossible to cache that value on the object (immutable objects certainly do) and require adding that extra parameter to the `__hash__` method, thus breaking forward compatibility or require introspection for each `__hash__` call. Brrrr. Since the security issue is mitigated by using a new random hash per interpreter, all that mess can be avoided. – Martijn Pieters Jun 21 '16 at 09:15
  • @jonrsharpe to answer your last reply with a counter-example I've done the following. Instead of rerunning the same code, same dictionary, same name I've replaced counter_example with change_name. The result proves that the hash does *not* depend on the name (as you are suggesting) and the root reason is just not overkilling –  Jun 21 '16 at 09:15
  • @MartijnPieters *wow* that would be a lot of overhead! – jonrsharpe Jun 21 '16 at 09:16
  • @python3.5_32bit_win7_64bit that is absolutely **not** what I was suggesting, evidently you do not understand me. It doesn't depend on the name, that would make no sense at all, it depends on the *values*. – jonrsharpe Jun 21 '16 at 09:17
  • @jonrsharpe Sorry, ok, we're on the same page :-) –  Jun 21 '16 at 09:19
  • @MartijnPieters Sorry again, just to be sure I've understood. Why would it ("attaching a random hash to each set or dict") require an extra parameter to the hash method? Can't you store that hash in the object implementation once it has been created? –  Jun 21 '16 at 09:31
  • @python3.5_32bit_win7_64bit: no, because then two objects can be equal (have the same value), but have a different hash. That'd break the core requirement of hashing. – Martijn Pieters Jun 21 '16 at 09:33
  • @MartijnPieters Yes, correct :-) Basically, this also is the same (sort of) "duplicated" [question](http://stackoverflow.com/questions/27522626/hash-function-in-python-3-3-returns-different-results-between-sessions) The name for the Pyton interpreter window there is "session" :-) Ok, everything's clarified –  Jun 21 '16 at 09:48

0 Answers0