How to find list intersection?

Question

a = [1,2,3,4,5]
b = [1,3,5,6]
c = a and b
print c

actual output: [1,3,5,6] expected output: [1,3,5]

How can we achieve a boolean AND operation (list intersection) on two lists?

The problem here is that `a and b` works like the following [statement from the documentation](http://docs.python.org/reference/expressions.html#boolean-operations) mentions it: "_The expression `x and y` first evaluates `x`; if `x` is false, its value is returned; otherwise, `y` is evaluated and the resulting value is returned._" — Tadeck, Dec 06 '11 at 09:52
Does this answer your question? [How can I compare two lists in python and return matches](https://stackoverflow.com/questions/1388818/how-can-i-compare-two-lists-in-python-and-return-matches) — Tomerikoo, Jan 18 '22 at 20:27

score 686 · Accepted Answer · answered Sep 13 '10 at 01:32

686

If order is not important and you don't need to worry about duplicates then you can use set intersection:

>>> a = [1,2,3,4,5]
>>> b = [1,3,5,6]
>>> list(set(a) & set(b))
[1, 3, 5]

answered Sep 13 '10 at 01:32

Mark Byers

811,555
193
1,581
1,452

26

what if `a = [1,1,2,3,4,5]` and `b = [1,1,3,5,6]` then the intersection is `[1,1,3,5]` but by above method it will result in only one `1` i.e. `[1, 3, 5]` what will be the write way to do it then? – Nitish Kumar Pal Oct 10 '18 at 05:18
19

@NItishKumarPal `intersection` is commonly understood to be _set_ based. You are looking for a slightly different animal - and you may need to do that manually by sorting each list and merging the results - and keeping dups in the merging. – WestCoastProjects Jan 06 '19 at 18:51
1

@MarkByers This will _not_ have duplicates afaict. – WestCoastProjects Jan 06 '19 at 21:53
Thanks Mark Byers :) For readers : beware the order of the list. In this case look at Lodewijk answer. – phili_b Dec 01 '20 at 09:47
5

I consider this to be an incorrect answer. The question explicitly mentions lists, not sets. Lists can have the same items more than once and so they are indeed a different beast, which is not addressed in this answer. – forcefsck Dec 19 '21 at 16:17
also lists are ordered, sets are not, so you lose this information (I believe result is unpredictable). – Vincenzooo Jul 01 '22 at 08:52
Anyway there is an issue with duplicates. If they appear a different number of times in the two lists, it is not clear which ones you want to retain (it seems to me that intersection becomes non commutative in this case). – Vincenzooo Jul 01 '22 at 08:59

score 188 · Answer 2 · edited May 12 '22 at 09:42

188

Using list comprehensions is a pretty obvious one for me. Not sure about performance, but at least things stay lists.

[x for x in a if x in b]

Or "all the x values that are in A, if the X value is in B".

edited May 12 '22 at 09:42

vvvvv

25,404
19
49
81

answered Oct 11 '15 at 16:47

Lodewijk

3,741
2
18
15

5

this seems the most pythonic which keeps order. not sure why this isn't upvoted higher!! thx for the great solution! – Bill D Mar 01 '18 at 06:15
63

This is an O(n^2) solution, whereas the solutions above are O(n) – nareddyt Jan 28 '19 at 22:26
16

@nareddyt - make `b` a set and you will have O(n) – jcchuks Aug 21 '19 at 15:16
10

@jcchuks The advantage of this solution is if you need to retain duplicates. If you can be sure of uniqueness, then an O(n) set solution would be better. However, if duplicates are not possible, why is the OP even talking about lists to begin with? The notion of list intersection is a mathematical absurdity – demongolem Nov 26 '19 at 12:07
1

It's actually linear, not squared :/ membership checks can be faster, this way of writing leaves it up to the compiler. (sort/hashtable b, get o(1) checks...) – Lodewijk Sep 23 '20 at 19:00
I'm not sure why @nareddyt says that this is O(n^2), but it's not. It makes one pass over `a` and includes the item if it is in `b`. Otherwise, it passes. – Stella Biderman Jul 03 '21 at 15:19
6

Because checking if an item is in list `b` is O(n), not O(1). You do this `n` times for each element in `a`. So O(n^2). – nareddyt Jul 04 '21 at 03:28
1

Though this may not most time efficient way, it has better flexibility than the selected set intersection method. It is very straight forward to add further calculations/operations to the last x part ("... if x in b"). For example, the request may want to check part of x with b (... if x[1:3] in b), or change type of x (... if str(x) in b). – Yunzhao Xing Nov 03 '21 at 15:04
1

What if the lists are [1,2,2,1] and [2] The output according to your code snippet will be [2,2] – Chaitanya Dokara Feb 23 '23 at 13:07

score 113 · Answer 3 · edited Aug 03 '16 at 12:57

113

If you convert the larger of the two lists into a set, you can get the intersection of that set with any iterable using intersection():

a = [1,2,3,4,5]
b = [1,3,5,6]
set(a).intersection(b)

edited Aug 03 '16 at 12:57

River

8,585
14
54
67

answered Sep 13 '10 at 01:36

Brian R. Bondy

339,232
124
596
636

15

Is this any different than `list(set(a) & set(b))` – user1767754 Jul 03 '17 at 19:21
5

why does it matter which list gets converted to set (assuming n != m)? what's the advantage of only converting one to set? – 3pitt Jun 12 '18 at 20:14
2

Seems like this would be faster – Nathan majicvr.com Sep 26 '18 at 21:40
same problems as the most accepted answer: duplicate removal, loss of oder – Vincenzooo Jul 01 '22 at 08:53

score 39 · Answer 4 · answered Sep 13 '10 at 01:35

39

Make a set out of the larger one:

_auxset = set(a)

Then,

c = [x for x in b if x in _auxset]

will do what you want (preserving b's ordering, not a's -- can't necessarily preserve both) and do it fast. (Using if x in a as the condition in the list comprehension would also work, and avoid the need to build _auxset, but unfortunately for lists of substantial length it would be a lot slower).

If you want the result to be sorted, rather than preserve either list's ordering, an even neater way might be:

c = sorted(set(a).intersection(b))

answered Sep 13 '10 at 01:35

Alex Martelli

854,459
170
1,222
1,395

This is almost certainly slower than the accepted answer but has the advantage that duplicates are not lost. – tripleee Oct 22 '19 at 03:54
1

This is the correct answer and I believe as fast as it can be in a generic situation (in which you can have duplicates and want to preserve order). – Vincenzooo Jul 01 '22 at 08:56

score 29 · Answer 5 · answered Oct 16 '15 at 10:34

Here's some Python 2 / Python 3 code that generates timing information for both list-based and set-based methods of finding the intersection of two lists.

The pure list comprehension algorithms are O(n^2), since in on a list is a linear search. The set-based algorithms are O(n), since set search is O(1), and set creation is O(n) (and converting a set to a list is also O(n)). So for sufficiently large n the set-based algorithms are faster, but for small n the overheads of creating the set(s) make them slower than the pure list comp algorithms.

#!/usr/bin/env python

''' Time list- vs set-based list intersection
    See http://stackoverflow.com/q/3697432/4014959
    Written by PM 2Ring 2015.10.16
'''

from __future__ import print_function, division
from timeit import Timer

setup = 'from __main__ import a, b'
cmd_lista = '[u for u in a if u in b]'
cmd_listb = '[u for u in b if u in a]'

cmd_lcsa = 'sa=set(a);[u for u in b if u in sa]'

cmd_seta = 'list(set(a).intersection(b))'
cmd_setb = 'list(set(b).intersection(a))'

reps = 3
loops = 50000

def do_timing(heading, cmd, setup):
    t = Timer(cmd, setup)
    r = t.repeat(reps, loops)
    r.sort()
    print(heading, r)
    return r[0]

m = 10
nums = list(range(6 * m))

for n in range(1, m + 1):
    a = nums[:6*n:2]
    b = nums[:6*n:3]
    print('\nn =', n, len(a), len(b))
    #print('\nn = %d\n%s %d\n%s %d' % (n, a, len(a), b, len(b)))
    la = do_timing('lista', cmd_lista, setup) 
    lb = do_timing('listb', cmd_listb, setup) 
    lc = do_timing('lcsa ', cmd_lcsa, setup)
    sa = do_timing('seta ', cmd_seta, setup)
    sb = do_timing('setb ', cmd_setb, setup)
    print(la/sa, lb/sa, lc/sa, la/sb, lb/sb, lc/sb)

output

n = 1 3 2
lista [0.082171916961669922, 0.082588911056518555, 0.0898590087890625]
listb [0.069530963897705078, 0.070394992828369141, 0.075379848480224609]
lcsa  [0.11858987808227539, 0.1188349723815918, 0.12825107574462891]
seta  [0.26900982856750488, 0.26902294158935547, 0.27298116683959961]
setb  [0.27218389511108398, 0.27459001541137695, 0.34307217597961426]
0.305460649521 0.258469975867 0.440838458259 0.301898526833 0.255455833892 0.435697630214

n = 2 6 4
lista [0.15915989875793457, 0.16000485420227051, 0.16551494598388672]
listb [0.13000702857971191, 0.13060092926025391, 0.13543915748596191]
lcsa  [0.18650484085083008, 0.18742108345031738, 0.19513416290283203]
seta  [0.33592700958251953, 0.34001994132995605, 0.34146714210510254]
setb  [0.29436492919921875, 0.2953648567199707, 0.30039691925048828]
0.473793098554 0.387009751735 0.555194537893 0.540689066428 0.441652573672 0.633583767462

n = 3 9 6
lista [0.27657914161682129, 0.28098297119140625, 0.28311991691589355]
listb [0.21585917472839355, 0.21679902076721191, 0.22272896766662598]
lcsa  [0.22559309005737305, 0.2271728515625, 0.2323150634765625]
seta  [0.36382699012756348, 0.36453008651733398, 0.36750602722167969]
setb  [0.34979605674743652, 0.35533690452575684, 0.36164689064025879]
0.760194128313 0.59330170819 0.62005595016 0.790686848184 0.61710008036 0.644927481902

n = 4 12 8
lista [0.39616990089416504, 0.39746403694152832, 0.41129183769226074]
listb [0.33485794067382812, 0.33914685249328613, 0.37850618362426758]
lcsa  [0.27405810356140137, 0.2745978832244873, 0.28249192237854004]
seta  [0.39211201667785645, 0.39234519004821777, 0.39317893981933594]
setb  [0.36988520622253418, 0.37011313438415527, 0.37571001052856445]
1.01034878821 0.85398540833 0.698928091731 1.07106176249 0.905302334456 0.740927452493

n = 5 15 10
lista [0.56792402267456055, 0.57422614097595215, 0.57740211486816406]
listb [0.47309303283691406, 0.47619009017944336, 0.47628307342529297]
lcsa  [0.32805585861206055, 0.32813096046447754, 0.3349759578704834]
seta  [0.40036201477050781, 0.40322518348693848, 0.40548801422119141]
setb  [0.39103078842163086, 0.39722800254821777, 0.43811702728271484]
1.41852623806 1.18166313332 0.819398061028 1.45237674242 1.20986133789 0.838951479847

n = 6 18 12
lista [0.77897095680236816, 0.78187918663024902, 0.78467702865600586]
listb [0.629547119140625, 0.63210701942443848, 0.63321495056152344]
lcsa  [0.36563992500305176, 0.36638498306274414, 0.38175487518310547]
seta  [0.46695613861083984, 0.46992206573486328, 0.47583580017089844]
setb  [0.47616910934448242, 0.47661614418029785, 0.4850609302520752]
1.66818870637 1.34819326075 0.783028414812 1.63591241329 1.32210827369 0.767878297495

n = 7 21 14
lista [0.9703209400177002, 0.9734041690826416, 1.0182771682739258]
listb [0.82394003868103027, 0.82625699043273926, 0.82796716690063477]
lcsa  [0.40975093841552734, 0.41210508346557617, 0.42286920547485352]
seta  [0.5086359977722168, 0.50968098640441895, 0.51014018058776855]
setb  [0.48688101768493652, 0.4879908561706543, 0.49204087257385254]
1.90769222837 1.61990115188 0.805587768483 1.99293236904 1.69228211566 0.841583309951

n = 8 24 16
lista [1.204819917678833, 1.2206029891967773, 1.258256196975708]
listb [1.014998197555542, 1.0206191539764404, 1.0343101024627686]
lcsa  [0.50966787338256836, 0.51018595695495605, 0.51319599151611328]
seta  [0.50310111045837402, 0.50556015968322754, 0.51335406303405762]
setb  [0.51472997665405273, 0.51948785781860352, 0.52113485336303711]
2.39478683834 2.01748351664 1.01305257092 2.34068341135 1.97190418975 0.990165516871

n = 9 27 18
lista [1.511646032333374, 1.5133969783782959, 1.5639569759368896]
listb [1.2461750507354736, 1.254518985748291, 1.2613379955291748]
lcsa  [0.5565330982208252, 0.56119203567504883, 0.56451296806335449]
seta  [0.5966339111328125, 0.60275578498840332, 0.64791703224182129]
setb  [0.54694414138793945, 0.5508568286895752, 0.55375313758850098]
2.53362406013 2.08867620074 0.932788243907 2.76380331728 2.27843203069 1.01753187594

n = 10 30 20
lista [1.7777848243713379, 2.1453688144683838, 2.4085969924926758]
listb [1.5070111751556396, 1.5202279090881348, 1.5779800415039062]
lcsa  [0.5954139232635498, 0.59703707695007324, 0.60746097564697266]
seta  [0.61563014984130859, 0.62125110626220703, 0.62354087829589844]
setb  [0.56723213195800781, 0.57257509231567383, 0.57460403442382812]
2.88774814689 2.44791645689 0.967161734066 3.13413984189 2.6567803378 1.04968299523

Generated using a 2GHz single core machine with 2GB of RAM running Python 2.6.6 on a Debian flavour of Linux (with Firefox running in the background).

These figures are only a rough guide, since the actual speeds of the various algorithms are affected differently by the proportion of elements that are in both source lists.

Ahmed Elsafty · Answer 6 · 2019-07-16T08:43:16.627

18

A functional way can be achieved using filter and lambda operator.

list1 = [1,2,3,4,5,6]

list2 = [2,4,6,9,10]

>>> list(filter(lambda x:x in list1, list2))

[2, 4, 6]

Edit: It filters out x that exists in both list1 and list, set difference can also be achieved using:

>>> list(filter(lambda x:x not in list1, list2))
[9,10]

Edit2: python3 filter returns a filter object, encapsulating it with list returns the output list.

edited Jul 16 '19 at 08:43

answered Mar 29 '17 at 11:57

Ahmed Elsafty

542
4
11

Please use the [edit] link to explain how this code works and don't just give the code, as an explanation is more likely to help future readers. See also [answer]. [source](http://stackoverflow.com/users/5244995) – Jed Fox Mar 29 '17 at 14:17
1

With Python3, this returns a filter object. You need to say `list(filter(lambda x:x in list1, list2))` to get it as a list. – Adrian W Jul 13 '19 at 20:07
This is quite good when list are unhashables! – dirac3000 Aug 25 '21 at 09:58

score 14 · Answer 7 · edited Jan 21 '19 at 22:30

14

a = [1,2,3,4,5]
b = [1,3,5,6]
c = list(set(a).intersection(set(b)))

Should work like a dream. And, if you can, use sets instead of lists to avoid all this type changing!

edited Jan 21 '19 at 22:30

Delimitry

2,987
4
30
39

answered Sep 13 '10 at 02:07

Alex Hart

1,663
12
15

If _a_ and _b_ are large then this is faster than other answers – WestCoastProjects Jan 06 '19 at 18:53

score 11 · Answer 8 · answered Apr 27 '21 at 12:00

11

You can also use numpy.intersect1d(ar1, ar2).
It return the unique and sorted values that are in both of two arrays.

answered Apr 27 '21 at 12:00

hamidreza zarrabi

120
1
7

score 6 · Answer 9 · answered Oct 31 '20 at 19:23

6

This way you get the intersection of two lists and also get the common duplicates.

>>> from collections import Counter
>>> a = Counter([1,2,3,4,5])
>>> b = Counter([1,3,5,6])
>>> a &= b
>>> list(a.elements())
[1, 3, 5]

answered Oct 31 '20 at 19:23

Saisankar Gochhayat

155
2
11

2

This should get more votes as it is symmetric (the top scoring answers aren't). Symmetry is when intersect(list_a, list_b) == intersect(list_b, list_a) – Brian Risk Sep 01 '22 at 16:26

score 4 · Answer 10 · answered Sep 06 '21 at 18:08

In the case you have a list of lists map comes handy:

>>> lists = [[1, 2, 3], [2, 3, 4], [2, 3, 5]]
>>> set(lists.pop()).intersection(*map(set, lists))
{2, 3}

would work for similar iterables:

>>> lists = ['ash', 'nazg']
>>> set(lists.pop()).intersection(*map(set, lists))
{'a'}

pop will blow if the list is empty so you may want to wrap in a function:

def intersect_lists(lists):
    try:
        return set(lists.pop()).intersection(*map(set, lists))
    except IndexError: # pop from empty list
        return set()

score 3 · Answer 11 · answered Feb 28 '19 at 18:09

This is an example when you need Each element in the result should appear as many times as it shows in both arrays.

def intersection(nums1, nums2):
    #example:
    #nums1 = [1,2,2,1]
    #nums2 = [2,2]
    #output = [2,2]
    #find first 2 and remove from target, continue iterating

    target, iterate = [nums1, nums2] if len(nums2) >= len(nums1) else [nums2, nums1] #iterate will look into target

    if len(target) == 0:
            return []

    i = 0
    store = []
    while i < len(iterate):

         element = iterate[i]

         if element in target:
               store.append(element)
               target.remove(element)

         i += 1


    return store

score 2 · Answer 12 · answered May 21 '19 at 11:44

It might be late but I just thought I should share for the case where you are required to do it manually (show working - haha) OR when you need all elements to appear as many times as possible or when you also need it to be unique.

Kindly note that tests have also been written for it.



    from nose.tools import assert_equal

    '''
    Given two lists, print out the list of overlapping elements
    '''

    def overlap(l_a, l_b):
        '''
        compare the two lists l_a and l_b and return the overlapping
        elements (intersecting) between the two
        '''

        #edge case is when they are the same lists
        if l_a == l_b:
            return [] #no overlapping elements

        output = []

        if len(l_a) == len(l_b):
            for i in range(l_a): #same length so either one applies
                if l_a[i] in l_b:
                    output.append(l_a[i])

            #found all by now
            #return output #if repetition does not matter
            return list(set(output))

        else:
            #find the smallest and largest lists and go with that
            sm = l_a if len(l_a)  len(l_b) else l_b

            for i in range(len(sm)):
                if sm[i] in lg:
                    output.append(sm[i])

            #return output #if repetition does not matter
            return list(set(output))

    ## Test the Above Implementation

    a = [1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
    b = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
    exp = [1, 2, 3, 5, 8, 13]

    c = [4, 4, 5, 6]
    d = [5, 7, 4, 8 ,6 ] #assuming it is not ordered
    exp2 = [4, 5, 6]

    class TestOverlap(object):

        def test(self, sol):
            t = sol(a, b)
            assert_equal(t, exp)
            print('Comparing the two lists produces')
            print(t)

            t = sol(c, d)
            assert_equal(t, exp2)
            print('Comparing the two lists produces')
            print(t)

            print('All Tests Passed!!')

    t = TestOverlap()
    t.test(overlap)

score 2 · Answer 13 · edited Mar 17 '21 at 07:11

Most of the solutions here don't take the order of elements in the list into account and treat lists like sets. If on the other hand you want to find one of the longest subsequences contained in both lists, you can try the following code.

def intersect(a, b):
    if a == [] or b == []: 
        return []
    inter_1 = intersect(a[1:], b)
    if a[0] in b:
        idx = b.index(a[0])
        inter_2 = [a[0]] + intersect(a[1:], b[idx+1:])        
        if len(inter_1) <= len(inter_2):
            return inter_2
    return inter_1

For a=[1,2,3] and b=[3,1,4,2] this returns [1,2] rather than [1,2,3]. Note that such a subsequence is not unique as [1], [2], [3] are all solutions for a=[1,2,3] and b=[3,2,1].

score 1 · Answer 14 · answered Sep 13 '10 at 01:35

1

If, by Boolean AND, you mean items that appear in both lists, e.g. intersection, then you should look at Python's set and frozenset types.

answered Sep 13 '10 at 01:35

Tim McNamara

18,019
4
52
83

score 1 · Answer 15 · answered Jun 15 '20 at 00:58

1

You can also use a counter! It doesn't preserve the order, but it'll consider the duplicates:

>>> from collections import Counter
>>> a = [1,2,3,4,5]
>>> b = [1,3,5,6]
>>> d1, d2 = Counter(a), Counter(b)
>>> c = [n for n in d1.keys() & d2.keys() for _ in range(min(d1[n], d2[n]))]
>>> print(c)
[1,3,5]

answered Jun 15 '20 at 00:58

Atonal

520
4
14

1

In your example you could just do `c = (d1 & d2).elements()`. – Juan A. Navarro Nov 24 '21 at 09:25

score 1 · Answer 16 · answered Oct 16 '21 at 23:13

1

when we used tuple and we want to intersection

a=([1,2,3,4,5,20], [8,3,9,5,1,4,20])

for i in range(len(a)):

    b=set(a[i-1]).intersection(a[i])
print(b)
{1, 3, 4, 5, 20}

answered Oct 16 '21 at 23:13

Hossein MEY

41
6

Can this not just be done using `set(a[0]).intersection(*a[1:])`? – Jab Oct 16 '21 at 23:15

score 0 · Answer 17 · answered Mar 21 '23 at 08:25

Store number of Occurrences of each element of list b inside a dict. Then iterate on list a and whenever current element found in dict push it inside result array and decrease its occurrence by one

from collections import Counter
a = [1,2,3,4,5,3]
b = [1,3,5,6,3]
res = []
b_key = Counter(b)
for i in a:
    if i in b_key and b_key[i] > 0:
        res.append(i)
        b_key[i] -= 1
    
print(res)

How to find list intersection?

17 Answers17

output : [1, 3, 5, 3]

Linked

Related