efficiently knowing if intersection of two list is empty or not, in python

Question

Suppose I have two lists, L and M. Now I want to know if they share an element. Which would be the fastest way of asking (in python) if they share an element? I don't care which elements they share, or how many, just if they share or not.

For example, in this case

L = [1,2,3,4,5,6]
M = [8,9,10]

I should get False, and here:

L = [1,2,3,4,5,6]
M = [5,6,7]

I should get True.

I hope the question's clear. Thanks!

Manuel

See http://stackoverflow.com/questions/3170055/test-if-lists-share-any-items-in-python for a more thorough analysis of this problem. `not frozenset(L).isdisjoint(M)` seems to be the optimal solution. — Edward Falk, Apr 04 '15 at 19:16

John La Rooy · Accepted Answer · 2010-02-04T06:22:14.043

60

Or more concisely

if set(L) & set(M):
    # there is an intersection
else:
    # no intersection

If you really need True or False

bool(set(L) & set(M))

After running some timings, this seems to be a good option to try too

m_set=set(M)
any(x in m_set  for x in L)

If the items in M or L are not hashable you have to use a less efficient approach like this

any(x in M for x in L)

Here are some timings for 100 item lists. Using sets is considerably faster when there is no intersection, and a bit slower when there is a considerable intersection.

M=range(100)
L=range(100,200)

timeit set(L) & set(M)
10000 loops, best of 3: 32.3 µs per loop

timeit any(x in M for x in L)
1000 loops, best of 3: 374 µs per loop

timeit m_set=frozenset(M);any(x in m_set  for x in L)
10000 loops, best of 3: 31 µs per loop

L=range(50,150)

timeit set(L) & set(M)
10000 loops, best of 3: 18 µs per loop

timeit any(x in M for x in L)
100000 loops, best of 3: 4.88 µs per loop

timeit m_set=frozenset(M);any(x in m_set  for x in L)
100000 loops, best of 3: 9.39 µs per loop


# Now for some random lists
import random
L=[random.randrange(200000) for x in xrange(1000)]
M=[random.randrange(200000) for x in xrange(1000)]

timeit set(L) & set(M)
1000 loops, best of 3: 420 µs per loop

timeit any(x in M for x in L)
10 loops, best of 3: 21.2 ms per loop

timeit m_set=set(M);any(x in m_set  for x in L)
1000 loops, best of 3: 168 µs per loop

timeit m_set=frozenset(M);any(x in m_set  for x in L)
1000 loops, best of 3: 371 µs per loop

edited Feb 04 '10 at 06:22

answered Feb 04 '10 at 05:28

John La Rooy

295,403
53
369
502

1

@gnibbler - Is it provable that the `any()` version is less efficient? It seems like it would go through `M` only until it found an element in `L`, at which point `any` would return `True` and be done. This sounds more efficient than converting both `L` and `M` to sets beforehand. At least, on paper. – Chris Lutz Feb 04 '10 at 05:40
This here, this is the answer. – jathanism Feb 04 '10 at 05:42
2

@Chris, worst case is when when there is no intersection - O(l*m). With sets i believe it is O(l+m) – John La Rooy Feb 04 '10 at 05:43
WOW! so bool(set(L) & set(M)) is faster than any(x in M for x in L)... Who would think? :) Thank you. – Manuel Araoz Feb 04 '10 at 05:54
1

@Manuel - The best, it seems, is to convert _one_ list to a `set` to allow for faster membership testing (`in`), then to filter based on this membership test (`x in m_set for x in L`). @gnibbler, can we get some tests that utilize two randomly constructed lists just for completeness? (and also +1 for a fine job) – Chris Lutz Feb 04 '10 at 06:07
Do you see any difference between frozenset and set in these tests? I just picked frozenset by default because this didn't happen to need mutability. – Darius Bacon Feb 04 '10 at 06:15
@Darius, see the final test. set was considerably faster than frozenset. – John La Rooy Feb 04 '10 at 06:26

Darius Bacon · Answer 2 · 2010-02-04T07:25:29.377

6

To avoid the work of constructing the intersection, and produce an answer as soon as we know that they intersect:

m_set = frozenset(M)
return any(x in m_set for x in L)

Update: gnibbler tried this out and found it to run faster with set() in place of frozenset(). Whaddayaknow.

edited Feb 04 '10 at 07:25

answered Feb 04 '10 at 05:37

Darius Bacon

14,921
5
53
53

score 3 · Answer 3 · answered Feb 04 '10 at 05:26

3

First of all, if you do not need them ordered, then switch to the set type.

If you still need the list type, then do it this way: 0 == False

len(set.intersection(set(L), set(M)))

answered Feb 04 '10 at 05:26

gahooa

131,293
12
98
101

This doesn't seem very efficient. I mean, the whole intersection is been calculated, isn't it!? Or is it lazily evaluated? Thanks! – Manuel Araoz Feb 04 '10 at 05:41
@Manuel, when i tested it, the intersection took less time to calculate than the time converting the lists to sets, so less than 1/3 of the total time – John La Rooy Feb 04 '10 at 06:06

tzot · Answer 4 · 2023-06-01T12:21:45.847

Note: this answer seems to be too-complicated for what at first glance needs to be only a set operation, but sets can contain only hashable items; the original question does not specify what items will be in the list. So this code first tries with sets and then falls back to more generic code.

That's the most generic and efficient in a balanced way I could come up with (comments should make the code easy to understand):

import itertools, operator

def _compare_product(list1, list2):
    "Return if any item in list1 equals any item in list2 exhaustively"
    return any(
        itertools.starmap(
            operator.eq,
            itertools.product(list1, list2)))

def do_they_intersect(list1, list2):
    "Return if any item is common between list1 and list2"

    # do not try to optimize for small list sizes
    if len(list1) * len(list2) <= 100: # pick a small number
        return _compare_product(list1, list2)

    # first try to make a set from one of the lists
    try: a_set= set(list1)
    except TypeError:
        try: a_set= set(list2)
        except TypeError:
            a_set= None
        else:
            a_list= list1
    else:
        a_list= list2

    # here either a_set is None, or we have a_set and a_list

    if a_set:
        return any(itertools.imap(a_set.__contains__, a_list))
    
    # try to sort the lists
    try:
        a_list1= sorted(list1)
        a_list2= sorted(list2)
    except TypeError: # sorry, not sortable
        return _compare_product(list1, list2)

    # they could be sorted, so let's take the N+M road,
    # not the N*M
    
    iter1= iter(a_list1)
    iter2= iter(a_list2)
    try:
        item1= next(iter1)
        item2= next(iter2)
    except StopIteration: # one of the lists is empty
        return False # ie no common items

    while 1:
        if item1 == item2:
            return True
        while item1 < item2:
            try: item1= next(iter1)
            except StopIteration: return False
        while item2 < item1:
            try: item2= next(iter2)
            except StopIteration: return False

HTH.

efficiently knowing if intersection of two list is empty or not, in python

4 Answers4

Linked