0

I want to efficiently find the intersection of two lists , keeping duplicates from both, e.g. A=[1,1,2,3], B=[1,1,2,4] should return [1,1,1,1,2]

I know a similar question was asked previously (Python intersection of two lists keeping duplicates) however this does not help me because only the duplicates from one list are retained.

The following works

def intersect(A,B):
    C=[]
    for a in A:
        for b in B:
            if a==b:
                C.append(a)
    return C

however it isn't efficient enough for what I'm doing! To speed things up I tried sorting the lists

def intersect(A,B):
    A.sort()
    B.sort()
    C=[]
    i=0
    j=0
    while i<len(A) and j<len(B):
        if A[i]<=B[j]:
            if A[i]==B[j]: 
                C.append(A[i])
            i+=1
        else:
            j=j+1
    return C

however this only keeps the duplicates from list B. Any suggestions?

Community
  • 1
  • 1
user141647
  • 13
  • 2
  • Just to clarify by "intersection of the lists" here you mean "if an item occurs N times in list A and M times in list B, it should occur N+M times in the new list, but if it either N or M is zero then it should occur zero times in the new list"? (That is an unusual use of the term "intersection".) – BrenBarn Apr 23 '15 at 17:16
  • 4
    You have 4 ```1```'s in the result because there are 2 in ```A``` and 2 in ```B```, so why does the result not have two ```2```s? – bj0 Apr 23 '15 at 17:18
  • 1
    I realized my above interpretation is incorrect based on your data. Why do you have only one 2 in the example result, but four 1s? – BrenBarn Apr 23 '15 at 17:18
  • Why is your solution not fast enough for the real case? How is it different? Can you elaborate? What I'm interested in knowing is whether you have a lot of unique values in your list or not? – tommy.carstensen Apr 23 '15 at 17:43
  • 1
    The reason there are four 1s there is because each 1 in list A is matched with the 1s in list B, whereas the 2 in list A is matched uniquely to the 2 in list B. Another example is A=[1,1,2,3], B=[1,2,4] should return [1,1,2] – user141647 Apr 23 '15 at 17:52
  • 1
    Are you working with large lists, or are you working with small lists but many intersections? Are there many repeating elements in your lists? Please show some examples for the time. – Ella Sharakanski Apr 23 '15 at 18:24
  • This is pretty similar to [python - Intersection of two lists including duplicates? - Stack Overflow](https://stackoverflow.com/questions/37645053/intersection-of-two-lists-including-duplicates);; **however** in this question, if an item occurs N times in list A and M times in list B, it should occur N×M times in the new list. – user202729 Aug 15 '21 at 03:43

3 Answers3

3

Here is the answer to your question as asked:

import collections
for A,B,expected_output in (
    ([1,1,2,3], [1,1,2,4], [1,1,1,1,2]),
    ([1,1,2,3], [1,2,4], [1,1,2])):
    cntA = collections.Counter(A)
    cntB = collections.Counter(B)
    output = [
        x for x in sorted(set(A) & set(B)) for i in range(cntA[x]*cntB[x])]
    assert output == expected_output

Here is the answer to the question as originally interpreted by myself and two others:

import collections
A=[1,1,2,3]
B=[1,1,2,4]
expected_output = [1,1,1,1,2,2]
cntA = collections.Counter(A)
cntB = collections.Counter(B)
cnt_sum = collections.Counter(A) + collections.Counter(B)
output = [x for x in sorted(set(A) & set(B)) for i in range(cnt_sum[x])]
assert output == expected_output

You can find the collections.Counter() documentation here. collections is a great module and I highly recommend giving the documentation on the whole module a read.

I realized you don't actually need to find the intersection of the sets, because the "count of a missing element is zero" according to the documentation:

import collections
for A,B,expected_output in (
    ([1,1,2,3], [1,1,2,4], [1,1,1,1,2]),
    ([1,1,2,3], [1,2,4], [1,1,2])):
    cntA = collections.Counter(A)
    cntB = collections.Counter(B)
    output = [
        x for x in sorted(set(A)) for i in range(cntA[x]*cntB[x])]
    assert output == expected_output
tommy.carstensen
  • 8,962
  • 15
  • 65
  • 108
2

How about this:

a_set = set(A)
b_set = set(B)
intersect = [i for i in A if i in b_set] + [j for j in B if j in a_set]

Two list comprehensions concatenated. A bit of extra time and memory is used to create sets of A and B, but that will be more than offset by the efficiency of checking membership of items in a set vs list.

You could also spruce it up a bit:

set_intersect = set(A) & set(B)
list_intersect = [ele for ele in A+B if ele in set_intersect]

Coerce both lists to sets, take their intersection, then use a list comprehension to add all elements from both lists A and B if they appear in the set intersection.

paidhima
  • 2,312
  • 16
  • 13
0

I'm having a hard time speeding your code since I don't know what are you running it on. It makes a lot of difference whether you run it on small or large lists and how many distinct elements are there. Anyway, here are some suggestions:

1.

def intersect(a, b):
    count_a = Counter(a)
    count_b = Counter(b)
    count_mul = []
    for i in count_a:
        count_mul.extend([i] * (count_a[i] * count_b[i]))
    return count_mul

2.

This returns an iterator, you can use list(iterator) to turn it into a list

def intersect(a, b):
    count_a = Counter(a)
    count_b = Counter(b)
    count_mul = Counter()
    for i in count_a:
        count_mul[i] += count_a[i] * count_b[i]
    return count_mul.elements()

3.

Very similar to your way but without changing the size of the list which takes time.

def intersect(A, B):
    return [a for a in A for b in B if a == b]

I'm not sure this gives any improvement to your original way, it really depends on the inputs, but your way is O(n*m) and mine is O(n+m).

You can use the module timeit to check how fast it runs on your input:

from timeit import timeit
timeit('test.intersect(A, B)', 'import test; A = [1,1,2,3]; B = [1,1,2,4]')
Ella Sharakanski
  • 2,683
  • 3
  • 27
  • 47