424
a = [1,2,3,4,5]
b = [1,3,5,6]
c = a and b
print c

actual output: [1,3,5,6] expected output: [1,3,5]

How can we achieve a boolean AND operation (list intersection) on two lists?

dreftymac
  • 31,404
  • 26
  • 119
  • 182
csguy11
  • 5,007
  • 6
  • 22
  • 20
  • 10
    The problem here is that `a and b` works like the following [statement from the documentation](http://docs.python.org/reference/expressions.html#boolean-operations) mentions it: "_The expression `x and y` first evaluates `x`; if `x` is false, its value is returned; otherwise, `y` is evaluated and the resulting value is returned._" – Tadeck Dec 06 '11 at 09:52
  • Does this answer your question? [How can I compare two lists in python and return matches](https://stackoverflow.com/questions/1388818/how-can-i-compare-two-lists-in-python-and-return-matches) – Tomerikoo Jan 18 '22 at 20:27

17 Answers17

686

If order is not important and you don't need to worry about duplicates then you can use set intersection:

>>> a = [1,2,3,4,5]
>>> b = [1,3,5,6]
>>> list(set(a) & set(b))
[1, 3, 5]
Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • 26
    what if `a = [1,1,2,3,4,5]` and `b = [1,1,3,5,6]` then the intersection is `[1,1,3,5]` but by above method it will result in only one `1` i.e. `[1, 3, 5]` what will be the write way to do it then? – Nitish Kumar Pal Oct 10 '18 at 05:18
  • 19
    @NItishKumarPal `intersection` is commonly understood to be _set_ based. You are looking for a slightly different animal - and you may need to do that manually by sorting each list and merging the results - and keeping dups in the merging. – WestCoastProjects Jan 06 '19 at 18:51
  • 1
    @MarkByers This will _not_ have duplicates afaict. – WestCoastProjects Jan 06 '19 at 21:53
  • Thanks Mark Byers :) For readers : beware the order of the list. In this case look at Lodewijk answer. – phili_b Dec 01 '20 at 09:47
  • 5
    I consider this to be an incorrect answer. The question explicitly mentions lists, not sets. Lists can have the same items more than once and so they are indeed a different beast, which is not addressed in this answer. – forcefsck Dec 19 '21 at 16:17
  • also lists are ordered, sets are not, so you lose this information (I believe result is unpredictable). – Vincenzooo Jul 01 '22 at 08:52
  • Anyway there is an issue with duplicates. If they appear a different number of times in the two lists, it is not clear which ones you want to retain (it seems to me that intersection becomes non commutative in this case). – Vincenzooo Jul 01 '22 at 08:59
188

Using list comprehensions is a pretty obvious one for me. Not sure about performance, but at least things stay lists.

[x for x in a if x in b]

Or "all the x values that are in A, if the X value is in B".

vvvvv
  • 25,404
  • 19
  • 49
  • 81
Lodewijk
  • 3,741
  • 2
  • 18
  • 15
  • 5
    this seems the most pythonic which keeps order. not sure why this isn't upvoted higher!! thx for the great solution! – Bill D Mar 01 '18 at 06:15
  • 63
    This is an O(n^2) solution, whereas the solutions above are O(n) – nareddyt Jan 28 '19 at 22:26
  • 16
    @nareddyt - make `b` a set and you will have O(n) – jcchuks Aug 21 '19 at 15:16
  • 10
    @jcchuks The advantage of this solution is if you need to retain duplicates. If you can be sure of uniqueness, then an O(n) set solution would be better. However, if duplicates are not possible, why is the OP even talking about lists to begin with? The notion of list intersection is a mathematical absurdity – demongolem Nov 26 '19 at 12:07
  • 1
    It's actually linear, not squared :/ membership checks can be faster, this way of writing leaves it up to the compiler. (sort/hashtable b, get o(1) checks...) – Lodewijk Sep 23 '20 at 19:00
  • I'm not sure why @nareddyt says that this is O(n^2), but it's not. It makes one pass over `a` and includes the item if it is in `b`. Otherwise, it passes. – Stella Biderman Jul 03 '21 at 15:19
  • 6
    Because checking if an item is in list `b` is O(n), not O(1). You do this `n` times for each element in `a`. So O(n^2). – nareddyt Jul 04 '21 at 03:28
  • 1
    Though this may not most time efficient way, it has better flexibility than the selected set intersection method. It is very straight forward to add further calculations/operations to the last x part ("... if x in b"). For example, the request may want to check part of x with b (... if x[1:3] in b), or change type of x (... if str(x) in b). – Yunzhao Xing Nov 03 '21 at 15:04
  • 1
    What if the lists are [1,2,2,1] and [2] The output according to your code snippet will be [2,2] – Chaitanya Dokara Feb 23 '23 at 13:07
113

If you convert the larger of the two lists into a set, you can get the intersection of that set with any iterable using intersection():

a = [1,2,3,4,5]
b = [1,3,5,6]
set(a).intersection(b)
River
  • 8,585
  • 14
  • 54
  • 67
Brian R. Bondy
  • 339,232
  • 124
  • 596
  • 636
39

Make a set out of the larger one:

_auxset = set(a)

Then,

c = [x for x in b if x in _auxset]

will do what you want (preserving b's ordering, not a's -- can't necessarily preserve both) and do it fast. (Using if x in a as the condition in the list comprehension would also work, and avoid the need to build _auxset, but unfortunately for lists of substantial length it would be a lot slower).

If you want the result to be sorted, rather than preserve either list's ordering, an even neater way might be:

c = sorted(set(a).intersection(b))
Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
  • This is almost certainly slower than the accepted answer but has the advantage that duplicates are not lost. – tripleee Oct 22 '19 at 03:54
  • 1
    This is the correct answer and I believe as fast as it can be in a generic situation (in which you can have duplicates and want to preserve order). – Vincenzooo Jul 01 '22 at 08:56
29

Here's some Python 2 / Python 3 code that generates timing information for both list-based and set-based methods of finding the intersection of two lists.

The pure list comprehension algorithms are O(n^2), since in on a list is a linear search. The set-based algorithms are O(n), since set search is O(1), and set creation is O(n) (and converting a set to a list is also O(n)). So for sufficiently large n the set-based algorithms are faster, but for small n the overheads of creating the set(s) make them slower than the pure list comp algorithms.

#!/usr/bin/env python

''' Time list- vs set-based list intersection
    See http://stackoverflow.com/q/3697432/4014959
    Written by PM 2Ring 2015.10.16
'''

from __future__ import print_function, division
from timeit import Timer

setup = 'from __main__ import a, b'
cmd_lista = '[u for u in a if u in b]'
cmd_listb = '[u for u in b if u in a]'

cmd_lcsa = 'sa=set(a);[u for u in b if u in sa]'

cmd_seta = 'list(set(a).intersection(b))'
cmd_setb = 'list(set(b).intersection(a))'

reps = 3
loops = 50000

def do_timing(heading, cmd, setup):
    t = Timer(cmd, setup)
    r = t.repeat(reps, loops)
    r.sort()
    print(heading, r)
    return r[0]

m = 10
nums = list(range(6 * m))

for n in range(1, m + 1):
    a = nums[:6*n:2]
    b = nums[:6*n:3]
    print('\nn =', n, len(a), len(b))
    #print('\nn = %d\n%s %d\n%s %d' % (n, a, len(a), b, len(b)))
    la = do_timing('lista', cmd_lista, setup) 
    lb = do_timing('listb', cmd_listb, setup) 
    lc = do_timing('lcsa ', cmd_lcsa, setup)
    sa = do_timing('seta ', cmd_seta, setup)
    sb = do_timing('setb ', cmd_setb, setup)
    print(la/sa, lb/sa, lc/sa, la/sb, lb/sb, lc/sb)

output

n = 1 3 2
lista [0.082171916961669922, 0.082588911056518555, 0.0898590087890625]
listb [0.069530963897705078, 0.070394992828369141, 0.075379848480224609]
lcsa  [0.11858987808227539, 0.1188349723815918, 0.12825107574462891]
seta  [0.26900982856750488, 0.26902294158935547, 0.27298116683959961]
setb  [0.27218389511108398, 0.27459001541137695, 0.34307217597961426]
0.305460649521 0.258469975867 0.440838458259 0.301898526833 0.255455833892 0.435697630214

n = 2 6 4
lista [0.15915989875793457, 0.16000485420227051, 0.16551494598388672]
listb [0.13000702857971191, 0.13060092926025391, 0.13543915748596191]
lcsa  [0.18650484085083008, 0.18742108345031738, 0.19513416290283203]
seta  [0.33592700958251953, 0.34001994132995605, 0.34146714210510254]
setb  [0.29436492919921875, 0.2953648567199707, 0.30039691925048828]
0.473793098554 0.387009751735 0.555194537893 0.540689066428 0.441652573672 0.633583767462

n = 3 9 6
lista [0.27657914161682129, 0.28098297119140625, 0.28311991691589355]
listb [0.21585917472839355, 0.21679902076721191, 0.22272896766662598]
lcsa  [0.22559309005737305, 0.2271728515625, 0.2323150634765625]
seta  [0.36382699012756348, 0.36453008651733398, 0.36750602722167969]
setb  [0.34979605674743652, 0.35533690452575684, 0.36164689064025879]
0.760194128313 0.59330170819 0.62005595016 0.790686848184 0.61710008036 0.644927481902

n = 4 12 8
lista [0.39616990089416504, 0.39746403694152832, 0.41129183769226074]
listb [0.33485794067382812, 0.33914685249328613, 0.37850618362426758]
lcsa  [0.27405810356140137, 0.2745978832244873, 0.28249192237854004]
seta  [0.39211201667785645, 0.39234519004821777, 0.39317893981933594]
setb  [0.36988520622253418, 0.37011313438415527, 0.37571001052856445]
1.01034878821 0.85398540833 0.698928091731 1.07106176249 0.905302334456 0.740927452493

n = 5 15 10
lista [0.56792402267456055, 0.57422614097595215, 0.57740211486816406]
listb [0.47309303283691406, 0.47619009017944336, 0.47628307342529297]
lcsa  [0.32805585861206055, 0.32813096046447754, 0.3349759578704834]
seta  [0.40036201477050781, 0.40322518348693848, 0.40548801422119141]
setb  [0.39103078842163086, 0.39722800254821777, 0.43811702728271484]
1.41852623806 1.18166313332 0.819398061028 1.45237674242 1.20986133789 0.838951479847

n = 6 18 12
lista [0.77897095680236816, 0.78187918663024902, 0.78467702865600586]
listb [0.629547119140625, 0.63210701942443848, 0.63321495056152344]
lcsa  [0.36563992500305176, 0.36638498306274414, 0.38175487518310547]
seta  [0.46695613861083984, 0.46992206573486328, 0.47583580017089844]
setb  [0.47616910934448242, 0.47661614418029785, 0.4850609302520752]
1.66818870637 1.34819326075 0.783028414812 1.63591241329 1.32210827369 0.767878297495

n = 7 21 14
lista [0.9703209400177002, 0.9734041690826416, 1.0182771682739258]
listb [0.82394003868103027, 0.82625699043273926, 0.82796716690063477]
lcsa  [0.40975093841552734, 0.41210508346557617, 0.42286920547485352]
seta  [0.5086359977722168, 0.50968098640441895, 0.51014018058776855]
setb  [0.48688101768493652, 0.4879908561706543, 0.49204087257385254]
1.90769222837 1.61990115188 0.805587768483 1.99293236904 1.69228211566 0.841583309951

n = 8 24 16
lista [1.204819917678833, 1.2206029891967773, 1.258256196975708]
listb [1.014998197555542, 1.0206191539764404, 1.0343101024627686]
lcsa  [0.50966787338256836, 0.51018595695495605, 0.51319599151611328]
seta  [0.50310111045837402, 0.50556015968322754, 0.51335406303405762]
setb  [0.51472997665405273, 0.51948785781860352, 0.52113485336303711]
2.39478683834 2.01748351664 1.01305257092 2.34068341135 1.97190418975 0.990165516871

n = 9 27 18
lista [1.511646032333374, 1.5133969783782959, 1.5639569759368896]
listb [1.2461750507354736, 1.254518985748291, 1.2613379955291748]
lcsa  [0.5565330982208252, 0.56119203567504883, 0.56451296806335449]
seta  [0.5966339111328125, 0.60275578498840332, 0.64791703224182129]
setb  [0.54694414138793945, 0.5508568286895752, 0.55375313758850098]
2.53362406013 2.08867620074 0.932788243907 2.76380331728 2.27843203069 1.01753187594

n = 10 30 20
lista [1.7777848243713379, 2.1453688144683838, 2.4085969924926758]
listb [1.5070111751556396, 1.5202279090881348, 1.5779800415039062]
lcsa  [0.5954139232635498, 0.59703707695007324, 0.60746097564697266]
seta  [0.61563014984130859, 0.62125110626220703, 0.62354087829589844]
setb  [0.56723213195800781, 0.57257509231567383, 0.57460403442382812]
2.88774814689 2.44791645689 0.967161734066 3.13413984189 2.6567803378 1.04968299523

Generated using a 2GHz single core machine with 2GB of RAM running Python 2.6.6 on a Debian flavour of Linux (with Firefox running in the background).

These figures are only a rough guide, since the actual speeds of the various algorithms are affected differently by the proportion of elements that are in both source lists.

PM 2Ring
  • 54,345
  • 6
  • 82
  • 182
18

A functional way can be achieved using filter and lambda operator.

list1 = [1,2,3,4,5,6]

list2 = [2,4,6,9,10]

>>> list(filter(lambda x:x in list1, list2))

[2, 4, 6]

Edit: It filters out x that exists in both list1 and list, set difference can also be achieved using:

>>> list(filter(lambda x:x not in list1, list2))
[9,10]

Edit2: python3 filter returns a filter object, encapsulating it with list returns the output list.

Ahmed Elsafty
  • 542
  • 4
  • 11
  • Please use the [edit] link to explain how this code works and don't just give the code, as an explanation is more likely to help future readers. See also [answer]. [source](http://stackoverflow.com/users/5244995) – Jed Fox Mar 29 '17 at 14:17
  • 1
    With Python3, this returns a filter object. You need to say `list(filter(lambda x:x in list1, list2))` to get it as a list. – Adrian W Jul 13 '19 at 20:07
  • This is quite good when list are unhashables! – dirac3000 Aug 25 '21 at 09:58
14
a = [1,2,3,4,5]
b = [1,3,5,6]
c = list(set(a).intersection(set(b)))

Should work like a dream. And, if you can, use sets instead of lists to avoid all this type changing!

Delimitry
  • 2,987
  • 4
  • 30
  • 39
Alex Hart
  • 1,663
  • 12
  • 15
11

You can also use numpy.intersect1d(ar1, ar2).
It return the unique and sorted values that are in both of two arrays.

6

This way you get the intersection of two lists and also get the common duplicates.

>>> from collections import Counter
>>> a = Counter([1,2,3,4,5])
>>> b = Counter([1,3,5,6])
>>> a &= b
>>> list(a.elements())
[1, 3, 5]
  • 2
    This should get more votes as it is symmetric (the top scoring answers aren't). Symmetry is when intersect(list_a, list_b) == intersect(list_b, list_a) – Brian Risk Sep 01 '22 at 16:26
4

In the case you have a list of lists map comes handy:

>>> lists = [[1, 2, 3], [2, 3, 4], [2, 3, 5]]
>>> set(lists.pop()).intersection(*map(set, lists))
{2, 3}

would work for similar iterables:

>>> lists = ['ash', 'nazg']
>>> set(lists.pop()).intersection(*map(set, lists))
{'a'}

pop will blow if the list is empty so you may want to wrap in a function:

def intersect_lists(lists):
    try:
        return set(lists.pop()).intersection(*map(set, lists))
    except IndexError: # pop from empty list
        return set()
Mr_and_Mrs_D
  • 32,208
  • 39
  • 178
  • 361
3

This is an example when you need Each element in the result should appear as many times as it shows in both arrays.

def intersection(nums1, nums2):
    #example:
    #nums1 = [1,2,2,1]
    #nums2 = [2,2]
    #output = [2,2]
    #find first 2 and remove from target, continue iterating

    target, iterate = [nums1, nums2] if len(nums2) >= len(nums1) else [nums2, nums1] #iterate will look into target

    if len(target) == 0:
            return []

    i = 0
    store = []
    while i < len(iterate):

         element = iterate[i]

         if element in target:
               store.append(element)
               target.remove(element)

         i += 1


    return store
2

It might be late but I just thought I should share for the case where you are required to do it manually (show working - haha) OR when you need all elements to appear as many times as possible or when you also need it to be unique.

Kindly note that tests have also been written for it.



    from nose.tools import assert_equal

    '''
    Given two lists, print out the list of overlapping elements
    '''

    def overlap(l_a, l_b):
        '''
        compare the two lists l_a and l_b and return the overlapping
        elements (intersecting) between the two
        '''

        #edge case is when they are the same lists
        if l_a == l_b:
            return [] #no overlapping elements

        output = []

        if len(l_a) == len(l_b):
            for i in range(l_a): #same length so either one applies
                if l_a[i] in l_b:
                    output.append(l_a[i])

            #found all by now
            #return output #if repetition does not matter
            return list(set(output))

        else:
            #find the smallest and largest lists and go with that
            sm = l_a if len(l_a)  len(l_b) else l_b

            for i in range(len(sm)):
                if sm[i] in lg:
                    output.append(sm[i])

            #return output #if repetition does not matter
            return list(set(output))

    ## Test the Above Implementation

    a = [1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
    b = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
    exp = [1, 2, 3, 5, 8, 13]

    c = [4, 4, 5, 6]
    d = [5, 7, 4, 8 ,6 ] #assuming it is not ordered
    exp2 = [4, 5, 6]

    class TestOverlap(object):

        def test(self, sol):
            t = sol(a, b)
            assert_equal(t, exp)
            print('Comparing the two lists produces')
            print(t)

            t = sol(c, d)
            assert_equal(t, exp2)
            print('Comparing the two lists produces')
            print(t)

            print('All Tests Passed!!')

    t = TestOverlap()
    t.test(overlap)

anabeto93
  • 307
  • 3
  • 12
2

Most of the solutions here don't take the order of elements in the list into account and treat lists like sets. If on the other hand you want to find one of the longest subsequences contained in both lists, you can try the following code.

def intersect(a, b):
    if a == [] or b == []: 
        return []
    inter_1 = intersect(a[1:], b)
    if a[0] in b:
        idx = b.index(a[0])
        inter_2 = [a[0]] + intersect(a[1:], b[idx+1:])        
        if len(inter_1) <= len(inter_2):
            return inter_2
    return inter_1

For a=[1,2,3] and b=[3,1,4,2] this returns [1,2] rather than [1,2,3]. Note that such a subsequence is not unique as [1], [2], [3] are all solutions for a=[1,2,3] and b=[3,2,1].

Community
  • 1
  • 1
Sven Meinhardt
  • 196
  • 2
  • 8
1

If, by Boolean AND, you mean items that appear in both lists, e.g. intersection, then you should look at Python's set and frozenset types.

Tim McNamara
  • 18,019
  • 4
  • 52
  • 83
1

You can also use a counter! It doesn't preserve the order, but it'll consider the duplicates:

>>> from collections import Counter
>>> a = [1,2,3,4,5]
>>> b = [1,3,5,6]
>>> d1, d2 = Counter(a), Counter(b)
>>> c = [n for n in d1.keys() & d2.keys() for _ in range(min(d1[n], d2[n]))]
>>> print(c)
[1,3,5]
Atonal
  • 520
  • 4
  • 14
1

when we used tuple and we want to intersection

a=([1,2,3,4,5,20], [8,3,9,5,1,4,20])

for i in range(len(a)):

    b=set(a[i-1]).intersection(a[i])
print(b)
{1, 3, 4, 5, 20}
0

Store number of Occurrences of each element of list b inside a dict. Then iterate on list a and whenever current element found in dict push it inside result array and decrease its occurrence by one

from collections import Counter
a = [1,2,3,4,5,3]
b = [1,3,5,6,3]
res = []
b_key = Counter(b)
for i in a:
    if i in b_key and b_key[i] > 0:
        res.append(i)
        b_key[i] -= 1
    
print(res)

output : [1, 3, 5, 3]