12

I've noticed some strange behavior on Python 3's implementation of json.dumps, namely the key order changes every time I dump the same object from execution to execution. Googling wasn't working since I don't care about sorting the keys, I just want them to remain the same! Here is an example script:

import json

data = {
    'number': 42,
    'name': 'John Doe',
    'email': 'john.doe@example.com',
    'balance': 235.03,
    'isadmin': False,
    'groceries': [
        'apples',
        'bananas',
        'pears',
    ],
    'nested': {
        'complex': True,
        'value': 2153.23412
    }
}

print(json.dumps(data, indent=2))

When I run this script I get different outputs every time, for example:

$ python print_data.py 
{
  "groceries": [
    "apples",
    "bananas",
    "pears"
  ],
  "isadmin": false,
  "nested": {
    "value": 2153.23412,
    "complex": true
  },
  "email": "john.doe@example.com",
  "number": 42,
  "name": "John Doe",
  "balance": 235.03
}

But then I run it again and I get:

$ python print_data.py 
{
  "email": "john.doe@example.com",
  "balance": 235.03,
  "name": "John Doe",
  "nested": {
    "value": 2153.23412,
    "complex": true
  },
  "isadmin": false,
  "groceries": [
    "apples",
    "bananas",
    "pears"
  ],
  "number": 42
}

I understand that dictionaries are unordered collections and that the order is based on a hash function; however in Python 2 - the order (whatever it is) is fixed and doesn't change on a per-execution basis. The difficulty here is that it's making my tests difficult to run because I need to compare the JSON output of two different modules!

Any idea what is going on? How to fix it? Note that I would like to avoid using an OrderedDict or performing any sorting and what matters is that the string representation remains the same between executions. Also this is for testing purposes only and doesn't have any effect on the implementation of my module.

bbengfort
  • 5,254
  • 4
  • 44
  • 57

3 Answers3

18

Python dictionaries and JSON objects are unordered. You can ask json.dumps() to sort the keys in the output; this is meant to ease testing. Use the sort_keys parameter to True:

print(json.dumps(data, indent=2, sort_keys=True))

See Why is the order in Python dictionaries and sets arbitrary? as to why you see a different order each time.

You can set the PYTHONHASHSEED environment variable to an integer value to 'lock' the dictionary order; use this only to run tests and not in production, as the whole point of hash randomisation is to prevent an attacker from trivially DOS-ing your program.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • From the post you linked, this is what I was looking for: "Note that as of Python 3.3, a random hash seed is used as well, making hash collisions unpredictable to prevent certain types of denial of service (where an attacker renders a Python server unresponsive by causing mass hash collisions). This means that the order of a given dictionary is then also dependent on the random hash seed for the current Python invocation." – bbengfort Jan 21 '16 at 20:15
  • Do you know how to fix the hash seed for testing purposes? My tests require that I don't pass extra arguments into the json.dumps function. – bbengfort Jan 21 '16 at 20:16
  • 2
    @bbengfort: You can set the [`PYTHONHASHSEED` environment variable](https://docs.python.org/3/using/cmdline.html#envvar-PYTHONHASHSEED) to an integer value. – Martijn Pieters Jan 21 '16 at 20:17
  • Might be useful if you include the PYTHONHASHSEED environment variable response in your answer -- as that was what I was really looking for. I'm happy to edit my question to make that more obvious if you think that's a good idea. – bbengfort Jan 21 '16 at 20:23
1

This behavior changed in Python 3.7. The json documentation says this:

Prior to Python 3.7, dict was not guaranteed to be ordered, so inputs and outputs were typically scrambled unless collections.OrderedDict was specifically requested. Starting with Python 3.7, the regular dict became order preserving, so it is no longer necessary to specify collections.OrderedDict for JSON generation and parsing.

paulie4
  • 457
  • 3
  • 10
0

The story behind this behavior is this vulnerability. To prevent it, same hash codes on one PC should be different on another one.

Python 2 has probably disabled this behavior (hash randomizing) by default because of compatibility, as this would for example break doctests. Python 3 probably (an assumption) has not needed the compability.

Smit Johnth
  • 2,281
  • 1
  • 22
  • 16