How to efficiently get common items from two lists that may have duplicates?

Question

my_list = ['a', 'b', 'a', 'd', 'e', 'f']
my_list_2 = ['a', 'b', 'c']

The common items are:

c = ['a', 'b', 'a']

The code:

for e in my_list:
   if e in my_list_2:
      c.append(e)
      ...

If the my_list is long, this would be very inefficient. If I convert both lists into two sets, then use set's intersection() function to get the common items, I will lose the duplicates in my_list.

How to deal with this efficiently?

The situation you've shown actually runs in linear time relative to the size of the list, since hashtable lookup (for `my_dict`) is constant-time. I don't think you an get better than that. — Green Cloak Guy, Mar 04 '22 at 22:52
Does this answer your question? [Answer to Get difference between two lists](https://stackoverflow.com/a/3462202/4541045) — ti7, Mar 04 '22 at 23:10

ti7 · Answer 1 · 2022-03-04T23:14:21.370

1

dict is already a hashmap, so lookups are practically as efficient as a set, so you may not need to do any extra work collecting the values - if it wasn't, you could pack the values into a set to check before checking the dict

However, a large improvement may be to make a generator for the values, rather than creating a new intermediate list, to iterate over where you actually want the values

def foo(src_dict, check_list):
    for value in check_list:
        if value in my_dict:
            yield value

With the edit, you may find you're better off packing all the inputs into a set

def foo(src_list, check_list):
    hashmap = set(src_list)
    for value in check_list:
        if value in hashmap:
            yield value

If you know a lot about the inputs, you can do better, but that's an unusual case (for example if the lists are ordered you could bisect, or if you have a huge verifying list or very very few values to check against it you may find some efficiency in the ordering and if you make a set)

edited Mar 04 '22 at 23:14

answered Mar 04 '22 at 22:55

ti7

16,375
6
40
68

Can you see my updated question? I can convert the dict into a list or set first, then the question is how to get common items with duplicates? – marlon Mar 04 '22 at 22:58
Only one new list 'c' is created. Why do you think a new intermediate list created? – marlon Mar 04 '22 at 23:02
if you create a generator, you don't need to create `c` at all and can instead iterate over the result (as presumably is wanted later)! `for value in foo(src_dict, check_list): ...` – ti7 Mar 04 '22 at 23:04
Hi, I still have to go rough each element in your 'check_list' variable. If this list is long, then it's an efficiency issue. Is there a way to avoid iterating through the. long list? If I convert this list into a set.then I lose the duplicates in the list. – marlon Mar 04 '22 at 23:28
@marlon this actually has a surprising number of questions associated with it and being very clear is your best chance of getting a good Answer! .. for example, the resulting order of elements will depend on which list is iterated over first if no other ordering happens .. and if the ordering and quantity is only taken from one list, making a `set()` of the other is likely a speedup .. please do prepare some more examples of what different inputs and outputs would look like if my Answer doesn't work for your case [`timeit` can help compare](https://stackoverflow.com/a/24105845/4541045) – ti7 Mar 04 '22 at 23:42

score 0 · Answer 2 · answered Mar 04 '22 at 22:53

0

I am not sure about time efficiency, but, personally speaking, list comprehension would always be more of interest to me:

[x for x in my_list if x in my_list_2]

Output

['a', 'b', 'a']

answered Mar 04 '22 at 22:53

TheFaultInOurStars

3,464
1
8
29

Can it be more efficient for two lists that I want to keep duplicates? – marlon Mar 04 '22 at 22:55
my_list_2 can be converted to a set, but still have to go through each element in the first list. – marlon Mar 04 '22 at 23:00
@marlon my_list_2 is actually a dictionary. So in this variable, there is no duplicated value. – TheFaultInOurStars Mar 04 '22 at 23:04

score 0 · Answer 3 · answered Mar 04 '22 at 23:41

First, utilize the set.intersection() method to get the intersecting values in the list. Then, use a nested list comprehension to include the duplicates based on the original list's count on each value:

my_list = ['a', 'b', 'a', 'd', 'e', 'f']
my_list_2 = ['a', 'b', 'c']

c = [x for x in set(my_list).intersection(set(my_list_2)) for _ in range(my_list.count(x))]
print(c)

The above may be slower than just

my_list = ['a', 'b', 'a', 'd', 'e', 'f']
my_list_2 = ['a', 'b', 'c']

c = []
for e in my_list:
   if e in my_list_2:
      c.append(e)
print(c)

But when the lists are significantly larger, the code block utilizing the set.intersection() method will be significantly more efficient (faster).

ahmedul Kabir Omi · Answer 4 · 2022-03-05T00:20:29.287

sorry for not reading the post carefully and now it is not possible to delete.. however, it is an attempt for solution.

c = lambda my_list, my_list_2: (my_list, my_list_2, list(set(my_list).intersection(set(my_list_2))))
print("(list_1,list_2,duplicate_items) -", c(my_list, my_list_2))

Output:

(list_1,list_2,duplicate_items) ->  (['a', 'b', 'a', 'd', 'e', 'f'], ['a', 'b', 'c'], ['b', 'a'])

or can be

[i for i in my_list if i in my_list_2]

output:

['a', 'b', 'a']

How to efficiently get common items from two lists that may have duplicates?

4 Answers4

Output