1

I understand why putting mutable objects in a dictionary is dangerous. However, converting all lists/sets to tuples/frozensets is expensive; and for many types, there's no easily available immutable versions at all. So, sometimes it may be worth hashing mutable objects directly, with the appropriate precautions to ensure the objects in question are never modified.

Before I start implementing the very complicated custom hash functions for mutable objects, I wanted to check if there's any disadvantage to using pickle.dumps as the hash function - either in terms of performance or collisions or anything else.

Community
  • 1
  • 1
max
  • 49,282
  • 56
  • 208
  • 355
  • 1
    Even if it works, it's likely not faster (or even slower) than just converting it to the immutable equivalent. Edit: `timeit` confirms this for me. Converting a ~4000 element set into a frozenset is 223 µs/loop whereas `pickle.dumps` takes 19.8 ms/loop -- read, it's an order of magnitude **slower**. –  Oct 04 '12 at 13:08
  • That's 2 orders of magnitude? If so, can you post it as an answer? – max Oct 04 '12 at 16:21

1 Answers1

7

The output from pickle is not guaranteed to be canonical for similar reasons to dict and set order being non-deterministic. Don't use pickle or pprint or repr for hashing.

gps
  • 1,360
  • 12
  • 12
  • 1
    Is this still valid for protocol=5? If not, how to hash sets/frozensets deterministically? – dawid Feb 07 '23 at 04:16