6

I have:

tuple1 = token1, token2
tuple2 = token2, token1
for tuple in [tuple1, tuple2]:
    if tuple in dict:
        dict[tuple] += 1
    else:
        dict[tuple] = 1

However, both tuple 1 and tuple2 are getting the same counts. What is a way to hash a group of 2 things such that order matters?

Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
Shazam
  • 689
  • 2
  • 9
  • 17

3 Answers3

23

Order is taken into account when hashing:

>>> hash((1,2))
1299869600
>>> hash((2,1))
1499606158

This assumes that the objects themselves have unique hashes. Even if they don't, you could still be OK when using it in a dictionary (as long as the objects themselves aren't equal as defined by their __eq__ method):

>>> t1 = 'a',hash('a') 
>>> [hash(x) for x in t1]  #both elements in the tuple have same hash value since `int` hash to themselves in cpython
[-468864544, -468864544]
>>> t2 = hash('a'),'a'
>>> hash(t1)
1486610051
>>> hash(t2)
1486610051
>>> d = {t1:1,t2:2}  #This is OK.  dict's don't fail when there is a hash collision
>>> d
{('a', -468864544): 1, (-468864544, 'a'): 2}
>>> d[t1]+=7
>>> d[t1]
8
>>> d[t1]+=7
>>> d[t1]
15
>>> d[t2]   #didn't touch d[t2] as expected.
2

Note that due to hash collisions, this dict is likely to be less efficient than another dict where there aren't hash collisions :)

mgilson
  • 300,191
  • 65
  • 633
  • 696
  • Assuming `token1` and `token2` `hash()` to different values themselves, of course. – Silas Ray Jan 17 '13 at 21:51
  • @sr2222 -- Yes, of course, but I suppose it is worth documenting that as you can override `__hash__` on a class in such a way to make order irrelevant when you have a tuple containing instances of that class ... but even then, depending on how `__eq__` was implemented, your `dict` could still come out OK (although really inefficient due to hash collisions). – mgilson Jan 17 '13 at 21:52
  • @sr2222 -- I've updated my answer further because of your comment. It was fun for me and I thought you might find it interesting. – mgilson Jan 17 '13 at 22:04
  • That is interesting. I need to reread the docs on the internal workings of dicts... – Silas Ray Jan 17 '13 at 22:07
  • 1
    @sr2222 -- I learned *a lot* when I watched the video that I reposted [here](http://stackoverflow.com/questions/12165200/order-of-unordered-python-sets/12165239#12165239) – mgilson Jan 17 '13 at 22:09
7

The reason they are getting the same count is that your code explicitly increments both token1,token2 and token2,token1 counts at the same time. If you don't, the counts won't stay in lockstep:

In [16]: import collections

In [17]: d = collections.defaultdict(int)

In [18]: d[1,2] += 1

In [19]: d[1,2]
Out[19]: 1

In [20]: d[2,1]
Out[20]: 0
NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • 1
    I think that I would usually write this as `d[(1,2)]` rather than `d[1,2]` even though they are equivalent ... – mgilson Jan 17 '13 at 21:22
0

It seems like you have posted one instance of the body of a loop. Might I suggest that you use collections.Counter for what you are trying to do, which does exactly what you want, but in one line:

counter = (collections.Counter(myListOfTuples) + 
           collections.Counter([j,i for i,j in myListOfTuples]))
mgilson
  • 300,191
  • 65
  • 633
  • 696
inspectorG4dget
  • 110,290
  • 27
  • 149
  • 241