1

I have a dictionary whose keys are tuples like (int, str, int, str, int), and the corresponding values are lists of floats of the same size.

I pickled the dictionary twice by the same script:

import pickle
with open(name, 'wb') as source:
    pickle.dump(the_dict, source)

For the two resulting binary files test_1 and test_2, I run

diff test_1 test_2

in a terminal (I'm using macOS) to see whether I can use diff to tell the difference. However, I received

Binary files test_1 and test_2 differ

Why? Was the same dictionary being pickled in different ways? Does it mean I cannot use diff to tell whether two dictionaries are identical?

meTchaikovsky
  • 7,478
  • 2
  • 15
  • 34
  • 1
    You probably need to use `OrderedDict()`, otherwise they'll likely get rearranged. – l'L'l Dec 29 '18 at 06:13
  • `diff` is not a good way to compare pickled data, not even for binary data in general. – Klaus D. Dec 29 '18 at 06:21
  • 1
    I probably can't reproduce this without your data. Try `diff <(python -m pickletools test_1) <(python -m pickletools test_2)`. This should be more informative than `Binary files ... differ` – gilch Dec 29 '18 at 06:30
  • This is a near-duplicate (but viewed from the other side) of: [Python pickle not one-to-one: different pickles give same object](https://stackoverflow.com/questions/21271479/python-pickle-not-one-to-one-different-pickles-give-same-object). – Gordon Davisson Dec 29 '18 at 06:54

1 Answers1

0

Depending on what version of Python you are using, Python versions before v3.6 do not remember the order of insertion. Python v3.6 made this an implementation detail and v3.7 made it a language feature.

For backwards compatibility, you shouldn't depend on the dictionary remembering the order of inserted keys. Instead, you can use OrderedDict from the Collections module.

Also, using diff on pickled dict data may show differences in the data even though the actual dictionaries are equivalent -- since dicts, unlike lists, generally make no assurances on order state (see above for when that is not the case).