Benchmark
This homemade, randomized benchmark demonstrates that the solution using in
is significantly faster in most case. I did not investigate, but I did encounter some runs where the solution with the nested for-loops was slightly faster when toying with the sample size.
import time, random
def time_it(f, rep=100000):
sample = [[random.randint(0, 100) for _ in range(20)] for _ in range(rep // 100)]
start = time.time()
for i in range(rep):
f(sample[i % len(sample)])
return (time.time() - start)
def nested_for(lst):
for i in lst:
for j in lst:
if i + j == 100:
return True
def nested_in(lst):
for i in lst:
diff = 100 - i
if diff in lst:
return True
print('for:', time_it(nested_for))
print('in:', time_it(nested_in))
Output
for: 0.7093353271484375
in: 0.24253296852111816
Removing the assignation of j
on every iteration is probably what removes a big overhead in the solution with the in
.
Improvement
Although note that both solutions are O(n2). You can achieve O(n) by using a set
. Since a set
hashes its items, lookup is O(1).
def contains_diff(lst):
elements = set(lst)
return any(100 - i in elements for i in elements)
print(contains_diff([1, 2, 3, 99])) # True
print(contains_diff([1, 2, 3])) # False
Interestingly enough, if you benchmark the above it will be generally slower than the in
solution. This is because the probability of in
finding a sum of 100
quickly in a randomized list is relatively high. If you let the difference you want to find grow, then the overhead of building a set
is rapidly compensated by the speed of set
lookup.
Sidenote
As a sidenote, you should not be using the list
as a variable name as it overwrites the builtin list
.