Hashing a tuple in Python where order matters?

Question

I have:

tuple1 = token1, token2
tuple2 = token2, token1
for tuple in [tuple1, tuple2]:
    if tuple in dict:
        dict[tuple] += 1
    else:
        dict[tuple] = 1

However, both tuple 1 and tuple2 are getting the same counts. What is a way to hash a group of 2 things such that order matters?

mgilson · Accepted Answer · 2013-01-17T22:06:23.507

23

Order is taken into account when hashing:

>>> hash((1,2))
1299869600
>>> hash((2,1))
1499606158

This assumes that the objects themselves have unique hashes. Even if they don't, you could still be OK when using it in a dictionary (as long as the objects themselves aren't equal as defined by their __eq__ method):

>>> t1 = 'a',hash('a') 
>>> [hash(x) for x in t1]  #both elements in the tuple have same hash value since `int` hash to themselves in cpython
[-468864544, -468864544]
>>> t2 = hash('a'),'a'
>>> hash(t1)
1486610051
>>> hash(t2)
1486610051
>>> d = {t1:1,t2:2}  #This is OK.  dict's don't fail when there is a hash collision
>>> d
{('a', -468864544): 1, (-468864544, 'a'): 2}
>>> d[t1]+=7
>>> d[t1]
8
>>> d[t1]+=7
>>> d[t1]
15
>>> d[t2]   #didn't touch d[t2] as expected.
2

Note that due to hash collisions, this dict is likely to be less efficient than another dict where there aren't hash collisions :)

edited Jan 17 '13 at 22:06

answered Jan 17 '13 at 21:18

mgilson

300,191
65
633
696

Assuming `token1` and `token2` `hash()` to different values themselves, of course. – Silas Ray Jan 17 '13 at 21:51
@sr2222 -- Yes, of course, but I suppose it is worth documenting that as you can override `__hash__` on a class in such a way to make order irrelevant when you have a tuple containing instances of that class ... but even then, depending on how `__eq__` was implemented, your `dict` could still come out OK (although really inefficient due to hash collisions). – mgilson Jan 17 '13 at 21:52
@sr2222 -- I've updated my answer further because of your comment. It was fun for me and I thought you might find it interesting. – mgilson Jan 17 '13 at 22:04
That is interesting. I need to reread the docs on the internal workings of dicts... – Silas Ray Jan 17 '13 at 22:07
1

@sr2222 -- I learned *a lot* when I watched the video that I reposted [here](http://stackoverflow.com/questions/12165200/order-of-unordered-python-sets/12165239#12165239) – mgilson Jan 17 '13 at 22:09

score 7 · Answer 2 · answered Jan 17 '13 at 21:19

7

The reason they are getting the same count is that your code explicitly increments both token1,token2 and token2,token1 counts at the same time. If you don't, the counts won't stay in lockstep:

In [16]: import collections

In [17]: d = collections.defaultdict(int)

In [18]: d[1,2] += 1

In [19]: d[1,2]
Out[19]: 1

In [20]: d[2,1]
Out[20]: 0

answered Jan 17 '13 at 21:19

NPE

486,780
108
951
1,012

1

I think that I would usually write this as `d[(1,2)]` rather than `d[1,2]` even though they are equivalent ... – mgilson Jan 17 '13 at 21:22

score 0 · Answer 3 · edited Jan 17 '13 at 21:43

0

It seems like you have posted one instance of the body of a loop. Might I suggest that you use collections.Counter for what you are trying to do, which does exactly what you want, but in one line:

counter = (collections.Counter(myListOfTuples) + 
           collections.Counter([j,i for i,j in myListOfTuples]))

edited Jan 17 '13 at 21:43

mgilson

300,191
65
633
696

answered Jan 17 '13 at 21:25

inspectorG4dget

110,290
27
149
241

Hashing a tuple in Python where order matters?

3 Answers3