30
>>> a = [1,1,1,2,3,4,4]
>>> b = [1,1,2,3,3,3,4]

[1,1,2,3,4]

Please note this is not the same question as this: Python intersection of two lists keeping duplicates Because even though there are three 1s in list a, there are only two in list b so the result should only have two.

Community
  • 1
  • 1
Kevin
  • 977
  • 1
  • 10
  • 17
  • 1
    How is it not a duplicate? Do you understand the definition of intersection? The problem with that question is that the duplicates of the second list were removed by `set()` – OneCricketeer Jun 05 '16 at 18:03
  • it looks you have to work with 2 steps there. 1) create a list with the intersection. 2) check if the number is present in both lists, and if so, append it to the list with the intersections – dot.Py Jun 05 '16 at 18:05
  • @OneCricketeer (that comment is old but) you can see that the two questions are different. This question is the "correct" multiset intersection, while the other is just "list of elements in list1 that is in list2". I don't know if there's a good way to explain it, but if you known you can edit to clarify the other question. – user202729 Aug 15 '21 at 03:35

6 Answers6

56

You can use collections.Counter for this, which will provide the lowest count found in either list for each element when you take the intersection.

from collections import Counter

c = list((Counter(a) & Counter(b)).elements())

Outputs:

[1, 1, 2, 3, 4]
miradulo
  • 28,857
  • 6
  • 80
  • 93
  • How if I want to find the intersection of multiple lists? – Giang Nguyen Feb 25 '21 at 06:19
  • For computing the intersection between >=3 lists see [Python -Intersection of multiple lists? - Stack Overflow](https://stackoverflow.com/questions/3852780/python-intersection-of-multiple-lists) – user202729 Aug 15 '21 at 03:56
8

Simple with no additional imports and easy to debug :)

Disadvantage: The value of list b is changed. Work on a copy of b if you don't want to change b.

c = list()
for x in a:
    if x in b:
        b.remove(x)
        c.append(x)
mujjiga
  • 16,186
  • 2
  • 33
  • 51
  • Downside: It's `O(mn)` (where `m` is the length of `a` and `n` is the length of `b`). Optimal solutions ([using stuff like `Counter`](https://stackoverflow.com/a/37645155/364696)) are `O(m + n)`. For the tiny inputs in the OP's case it won't matter, but for not-all-that-big inputs (say, 1000 items a piece), the difference between roughly 2000 "units of work" and 1,000,000 "units of work" is quite significant. – ShadowRanger Nov 10 '22 at 22:28
6

The accepted solution posted using Counter is simple, but I think this approach using a dictionary will work too and can be faster -- even on lists that aren't ordered (that requirement wasn't really mentioned, but at least one of the other solutions assumes that is the case).

a = [1, 1, 1, 2, 3, 4, 4]
b = [1, 1, 2, 3, 3, 3, 4]
    
def intersect(nums1, nums2):
    match = {}
    for x in nums1:
        if x in match:
            match[x] += 1
        else:
            match[x] = 1
            
    i = []
    for x in nums2:
        if x in match:
            i.append(x)
            match[x] -= 1
            if match[x] == 0:
                del match[x]

    return i

def intersect2(nums1, nums2):
    return list((Counter(nums1) & Counter(nums2)).elements())

timeit intersect(a,b)
100000 loops, best of 3: 3.8 µs per loop

timeit intersect2(a,b)
The slowest run took 4.90 times longer than the fastest. This could mean 
that an intermediate result is being cached.
10000 loops, best of 3: 20.4 µs per loop

I tested with lists of random ints of size 1000 and 10000 and it was faster there too.

a = [random.randint(0,100) for r in xrange(10000)]
b = [random.randint(0,100) for r in xrange(1000)]

timeit intersect(a,b)
100 loops, best of 3: 2.35 ms per loop

timeit intersect2(a,b)
100 loops, best of 3: 4.2 ms per loop

And larger lists that would have more common elements

a = [random.randint(0,10) for r in xrange(10000)]
b = [random.randint(0,10) for r in xrange(1000)]

timeit intersect(a,b)
100 loops, best of 3: 2.07 ms per loop

timeit intersect2(a,b)
100 loops, best of 3: 3.41 ms per loop
ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
delsauce
  • 81
  • 1
  • 4
  • 1
    are `i` and `c` supposed to be the same thing, maybe? `i` isn't changed after its initialized, and then it's returned as an empty list. – nsheff Jan 12 '22 at 16:20
  • err.. yeah. Fixed, thanks. – delsauce Jan 12 '22 at 23:54
  • unnecessarily long codes are bad. You can do the same in one line. c = [x for x in a if x in b] – LazerDance Jul 25 '22 at 15:09
  • @delsauce: For the record, the slow `Counter` issue is likely exacerbated because you were using Python 2. In Python 3.2 and higher, `Counter` has a C accelerator for counting long inputs which runs *much* faster than any loop you can write in Python. The raw `dict` solution has some tiny advantages (streaming one input instead of two `Counter`s), but it's still likely to lose in most cases simply because the accelerator makes the counting *so* much faster (and the intersection operation, while implemented in Python, ends up cheaper since each input is already deduped to a count). – ShadowRanger Nov 10 '22 at 22:44
  • Your streaming solution does have the advantage of completely preserving order of appearance in the second sequence (flip which one is counted and which one is streamed and it would preserve order of appearance from the first sequence instead, which I'd personally find more predictable) . – ShadowRanger Nov 10 '22 at 22:46
4

This should also works.

a = [1, 1, 1, 2, 3, 4, 4]
b = [1, 1, 2, 3, 3, 3, 4]
c = []
i, j = 0, 0
while i < len(a) and j < len(b):
    if a[i] == b[j]:
        c.append(a[i])
        i += 1
        j += 1
    elif a[i] > b[j]:
        j += 1
    else:
        i += 1

print(c) # [1, 1, 2, 3, 4]
Kentaro Wada
  • 505
  • 1
  • 4
  • 13
2

This should also work:

def list_intersect(lisA, lisB):
    """ Finds the intersection of 2 lists including common duplicates"""

    Iset = set(lisA).intersection(set(lisB))
    Ilis = []
    for i in Iset:
        num = min(lisA.count(i), lisB.count(i))
        for j in range(num):
            Ilis.append(i)
    return Ilis
DrFalcon
  • 121
  • 2
1

This will do:

from itertools import chain
list(chain.from_iterable([(val,)*min(a.count(val), b.count(val)) for val in (set(a) & set(b))]))

Gives:

[1, 1, 2, 3, 4]
j4hangir
  • 2,532
  • 1
  • 13
  • 14