Python: intersection of 2 lists keeping duplicates from both lists

Question

I want to efficiently find the intersection of two lists , keeping duplicates from both, e.g. A=[1,1,2,3], B=[1,1,2,4] should return [1,1,1,1,2]

I know a similar question was asked previously (Python intersection of two lists keeping duplicates) however this does not help me because only the duplicates from one list are retained.

The following works

def intersect(A,B):
    C=[]
    for a in A:
        for b in B:
            if a==b:
                C.append(a)
    return C

however it isn't efficient enough for what I'm doing! To speed things up I tried sorting the lists

def intersect(A,B):
    A.sort()
    B.sort()
    C=[]
    i=0
    j=0
    while i<len(A) and j<len(B):
        if A[i]<=B[j]:
            if A[i]==B[j]: 
                C.append(A[i])
            i+=1
        else:
            j=j+1
    return C

however this only keeps the duplicates from list B. Any suggestions?

Just to clarify by "intersection of the lists" here you mean "if an item occurs N times in list A and M times in list B, it should occur N+M times in the new list, but if it either N or M is zero then it should occur zero times in the new list"? (That is an unusual use of the term "intersection".) — BrenBarn, Apr 23 '15 at 17:16
You have 4 ```1```'s in the result because there are 2 in ```A``` and 2 in ```B```, so why does the result not have two ```2```s? — bj0, Apr 23 '15 at 17:18
I realized my above interpretation is incorrect based on your data. Why do you have only one 2 in the example result, but four 1s? — BrenBarn, Apr 23 '15 at 17:18
Why is your solution not fast enough for the real case? How is it different? Can you elaborate? What I'm interested in knowing is whether you have a lot of unique values in your list or not? — tommy.carstensen, Apr 23 '15 at 17:43
The reason there are four 1s there is because each 1 in list A is matched with the 1s in list B, whereas the 2 in list A is matched uniquely to the 2 in list B. Another example is A=[1,1,2,3], B=[1,2,4] should return [1,1,2] — user141647, Apr 23 '15 at 17:52
Are you working with large lists, or are you working with small lists but many intersections? Are there many repeating elements in your lists? Please show some examples for the time. — Ella Sharakanski, Apr 23 '15 at 18:24
This is pretty similar to [python - Intersection of two lists including duplicates? - Stack Overflow](https://stackoverflow.com/questions/37645053/intersection-of-two-lists-including-duplicates);; **however** in this question, if an item occurs N times in list A and M times in list B, it should occur N×M times in the new list. — user202729, Aug 15 '21 at 03:43

tommy.carstensen · Accepted Answer · 2015-04-23T19:01:38.643

3

Here is the answer to your question as asked:

import collections
for A,B,expected_output in (
    ([1,1,2,3], [1,1,2,4], [1,1,1,1,2]),
    ([1,1,2,3], [1,2,4], [1,1,2])):
    cntA = collections.Counter(A)
    cntB = collections.Counter(B)
    output = [
        x for x in sorted(set(A) & set(B)) for i in range(cntA[x]*cntB[x])]
    assert output == expected_output

Here is the answer to the question as originally interpreted by myself and two others:

import collections
A=[1,1,2,3]
B=[1,1,2,4]
expected_output = [1,1,1,1,2,2]
cntA = collections.Counter(A)
cntB = collections.Counter(B)
cnt_sum = collections.Counter(A) + collections.Counter(B)
output = [x for x in sorted(set(A) & set(B)) for i in range(cnt_sum[x])]
assert output == expected_output

You can find the collections.Counter() documentation here. collections is a great module and I highly recommend giving the documentation on the whole module a read.

I realized you don't actually need to find the intersection of the sets, because the "count of a missing element is zero" according to the documentation:

import collections
for A,B,expected_output in (
    ([1,1,2,3], [1,1,2,4], [1,1,1,1,2]),
    ([1,1,2,3], [1,2,4], [1,1,2])):
    cntA = collections.Counter(A)
    cntB = collections.Counter(B)
    output = [
        x for x in sorted(set(A)) for i in range(cntA[x]*cntB[x])]
    assert output == expected_output

edited Apr 23 '15 at 19:01

answered Apr 23 '15 at 17:56

tommy.carstensen

8,962
15
65
108

I want the code to do the same job as the first piece of code in the question but more efficiently, the expected output is [1,1,1,1,2] – user141647 Apr 23 '15 at 18:04
The top one works exactly as I expect it to and has drastically sped up my code. Thank you! – user141647 Apr 23 '15 at 18:30
1

`cnt_sum` is redundant in the first one – Ella Sharakanski Apr 23 '15 at 18:53
@EllaShar Sorry, I forgot to delete that. Fixed. – tommy.carstensen Apr 23 '15 at 18:54
And also maybe changing `range` to `xrange` might speed it a bit – Ella Sharakanski Apr 23 '15 at 19:05
1

@EllaShar Yes, definitely do xrange instead of range if Python2. I assumed Python3, but question just tagged Python. – tommy.carstensen Apr 23 '15 at 19:08
1

@EllaShar: As a rule of thumb, it's preferable to provide an answer that works with both Python 2 and Python 3, especially when the version is not stated or tagged in the question. – John Y Apr 23 '15 at 19:15
Didn't know there is no `xrange` in python 3. Sorry – Ella Sharakanski Apr 23 '15 at 19:16

score 2 · Answer 2 · answered Apr 23 '15 at 17:37

How about this:

a_set = set(A)
b_set = set(B)
intersect = [i for i in A if i in b_set] + [j for j in B if j in a_set]

Two list comprehensions concatenated. A bit of extra time and memory is used to create sets of A and B, but that will be more than offset by the efficiency of checking membership of items in a set vs list.

You could also spruce it up a bit:

set_intersect = set(A) & set(B)
list_intersect = [ele for ele in A+B if ele in set_intersect]

Coerce both lists to sets, take their intersection, then use a list comprehension to add all elements from both lists A and B if they appear in the set intersection.

this gives [1,1,1,1,2,2] for the example I gave, not [1,1,1,1,2] — user141647, Apr 23 '15 at 18:05

Ella Sharakanski · Answer 3 · 2015-04-23T19:27:25.787

I'm having a hard time speeding your code since I don't know what are you running it on. It makes a lot of difference whether you run it on small or large lists and how many distinct elements are there. Anyway, here are some suggestions:

1.

def intersect(a, b):
    count_a = Counter(a)
    count_b = Counter(b)
    count_mul = []
    for i in count_a:
        count_mul.extend([i] * (count_a[i] * count_b[i]))
    return count_mul

2.

This returns an iterator, you can use list(iterator) to turn it into a list

def intersect(a, b):
    count_a = Counter(a)
    count_b = Counter(b)
    count_mul = Counter()
    for i in count_a:
        count_mul[i] += count_a[i] * count_b[i]
    return count_mul.elements()

3.

Very similar to your way but without changing the size of the list which takes time.

def intersect(A, B):
    return [a for a in A for b in B if a == b]

I'm not sure this gives any improvement to your original way, it really depends on the inputs, but your way is O(n*m) and mine is O(n+m).

You can use the module timeit to check how fast it runs on your input:

from timeit import timeit
timeit('test.intersect(A, B)', 'import test; A = [1,1,2,3]; B = [1,1,2,4]')

the inputs will be tuples of length 3 consisting of large integers — user141647, Apr 23 '15 at 19:09

Python: intersection of 2 lists keeping duplicates from both lists

3 Answers3

Linked