0

Let's assume we have two lists containing unique values and want to find the values that are in both lists using list comprehension.

a = [1,3,5]
b = [3,5,7]

c = [x for x in a if x in b]

print(c)
[3,5]

Simple enough. Now, what if each list had 1 million elements that we wanted to compare? List comprehension would continue to compare every element in list a to every element in list b, even after it has found '5' (from the example above) in both lists. Is that correct?

Would removing an element from the lists when it is found in both lists be more efficient to shorten the comparison time as it loops? Or is there something else I've probably missed?

for x in a:
    if x in b:
        c.append(x)
        a.remove(x)
        b.remove(x)

print(a)
[1]
print(b)
[7]
print(c)
[3,5]
dubch87
  • 101
  • 2
  • 5

1 Answers1

1

Removing x from your list a in

for x in a:
    if x in b:
        c.append(x)
        a.remove(x)
        b.remove(x)

would add extra time complexity for the removing of the item. Whenever you call the remove it's of O(n) complexity with n being the number of items in the list. You could write a second for loop which makes it a bit faster, because you could "break" whenever you find the element. However, I think the biggest performance gainer is using a set, because of the O(1) lookup time. You can read about a set here:

I import the performance counter from the time library to measure the time.

from time import perf_counter

I generate two lists with unique elements:

a = [x for x in range(10000)]
b = [x * 2 for x in range(10000)]

I measure the time from your operation mentioned above:

start_list = perf_counter()
c = [x for x in a if x in b]
stop_list = perf_counter()
print(f"Calculating with list operations took {stop_list - start_list}s")

I measure the time via set operations:

start_set = perf_counter()
d = list(set(a) & set(b))
stop_set = perf_counter()
print(f"Calculating with set operations took {stop_set - start_set}s")

Just to make sure the two methods give the same result:

assert c == d

Output:

Calculating with list operations took 0.796774061s
Calculating with set operations took 0.0013706330000000655s
Christian Weiss
  • 119
  • 1
  • 14