4

What is the quickest way to find matching 2-tuples in another list of 2-tuples?

The following codes looks extremely inefficient. loc1 and loc2 are list of tuples of (x,y) coordinates.

loc3=[]
for loc in loc1:
    if loc in loc2:
        loc3.append(loc)

I think hashing is the key but not sure how to do it on Python. Please teach me an elegant code. Thanks.

notilas
  • 2,323
  • 4
  • 23
  • 36
  • 1
    You are completely right that a hashing is the key. And fortunately, Python makes that easy with the built-in `set` and `dict` classes, built around hash tables. So, mgilson's answer is exactly what you're looking for. – abarnert Jan 08 '13 at 01:40

1 Answers1

9

You can use sets and intersection:

loc3 = set(loc1).intersection(loc2)

This gives you a set which is unordered and won't contain duplicates (and enforces that the items are hashable). If that's a problem, see the other answer by Phil Frost. However, this should be significantly more efficient where order and duplicates are unnecessary.

A order preserving solution which can contain duplicates, but requires hashability of the items (in loc2) is as follows:

sloc2 = set(loc2)
loc3 = [ item for item in loc1 if item in sloc2 ]  #still O(m)

In python, a set is simply a hash table. Checking to see if an item is contained in that set is an (approximately) O(1) operation because the position of the item is found via hashing.

mgilson
  • 300,191
  • 65
  • 633
  • 696
  • 2
    +1 - you can also use this: `loc3 = list(set(loc1) & set(loc2))`. – Tadeck Jan 08 '13 at 00:49
  • @Tadeck -- Yeah, that's completely correct, but that requires building an extra `set` :). And I prefer `intersection` as it feels more explicit to me. – mgilson Jan 08 '13 at 00:50
  • Generator expressions / generators are another solution with different time / space costs. – Phil Frost Jan 08 '13 at 01:06
  • +1. I didn't realize you had the second version in your answer now, and wasted time writing the same answer… Anyway, it might be worth explaining that a `set` is a hash table, and checking `item in sloc2` just requires hashing `item`, so it's exactly what the OP knew he wanted but didn't know how to do in Python. – abarnert Jan 08 '13 at 01:46