Nested Loop or 'in', which is faster?

Question

I am a python novice and was studying some basic coding challenges and was hoping to someone could explain which of the following snippets of code would run faster. The point is to see if there are pairs of integers within the list that add up to 100:

list = [1,2,3,99,5]
for i in list:
    for j in list:
       if i + j == 100:
          return True

or:

list = [1,2,3,99,5]
for i in list: 
    diff = 100 - i 
    if diff in list:
        return True

Look up TimeIt module and cProfile module to see how fast it runs and also what it does under the hood. — dfundako, Jun 05 '18 at 15:44
Both `in` and the `for` loop require an `O(n)` scan of the list, and both will break early if the item is found. You do fewer calculations in the latter, though, so I'd expect it to be a bit faster. — jonrsharpe, Jun 05 '18 at 15:44
Although if you changed the list to a set, the second example should be faster. — glibdud, Jun 05 '18 at 15:46
Side note: **never** name your variable after a built-in. Use `lst` or `list_` instead. — jpp, Jun 05 '18 at 15:46
Another side note: Study about `two pointer techniques`, it will help. — Arpit Kathuria, Jun 05 '18 at 15:47
If you cared about performance (and had larger lengths), you wouldn't write *either* of these. Use a data type -- like a set or dictionary -- where a search is O(1). — Charles Duffy, Jun 05 '18 at 15:50

Olivier Melançon · Accepted Answer · 2018-06-05T16:29:59.960

Benchmark

This homemade, randomized benchmark demonstrates that the solution using in is significantly faster in most case. I did not investigate, but I did encounter some runs where the solution with the nested for-loops was slightly faster when toying with the sample size.

import time, random

def time_it(f, rep=100000):
    sample = [[random.randint(0, 100) for _ in range(20)] for _ in range(rep // 100)]

    start = time.time()
    for i in range(rep):
        f(sample[i % len(sample)])
    return (time.time() - start)

def nested_for(lst):
    for i in lst:
        for j in lst:
            if i + j == 100:
                return True

def nested_in(lst):
    for i in lst:
        diff = 100 - i
        if diff in lst:
            return True

print('for:', time_it(nested_for))
print('in:', time_it(nested_in))

Output

for: 0.7093353271484375
in: 0.24253296852111816

Removing the assignation of j on every iteration is probably what removes a big overhead in the solution with the in.

Improvement

Although note that both solutions are O(n²). You can achieve O(n) by using a set. Since a set hashes its items, lookup is O(1).

def contains_diff(lst):
    elements = set(lst)
    return any(100 - i in elements for i in elements)

print(contains_diff([1, 2, 3, 99])) # True
print(contains_diff([1, 2, 3])) # False

Interestingly enough, if you benchmark the above it will be generally slower than the in solution. This is because the probability of in finding a sum of 100 quickly in a randomized list is relatively high. If you let the difference you want to find grow, then the overhead of building a set is rapidly compensated by the speed of set lookup.

Sidenote

As a sidenote, you should not be using the list as a variable name as it overwrites the builtin list.

I removed my answer because this is a better explanation but can you avoid the questioner's use of overriding the built-in "list" — Zev, Jun 05 '18 at 16:00
+1 This was a fantastic explanation, and what I was looking for. As for the use of list as a variable name, it was just for the sake of the question. Thanks so much! — Carlos Romero, Jun 05 '18 at 16:18

Nested Loop or 'in', which is faster?

1 Answers1

Benchmark

Output

Improvement

Sidenote