7

I just read another users question while looking for a way to compute the differences in two lists.

Python, compute list difference

My question is why would I do

def diff(a,b):
    b = set(b)
    return [aa for aa in a if aa not in b]

rather than doing

def diff(a,b):
    tmp = []
    for i in a:
        if(i not in b):
            tmp.append(i)
return tmp

edit: just noticed the second diff function actually returned the similarities. It should be correct now.

Community
  • 1
  • 1
Jake
  • 4,134
  • 2
  • 16
  • 20

5 Answers5

8

Just from an algorithmic perspective, it takes O(n) to construct the set and O(n) to do the list comprehension (since testing if an element is contained in a set is O(1)). However in the second example, it would take O(n^2) to loop through both lists. So regardless of the programming language, the first approach is superior.

Also, list comprehensions in python are inherently faster than a for loop. This reduces the constant factor even more (and significantly so too). The reason why can be summarized in this post which I quote here:

The fact that list comprehensions can only be comprised of expressions, not statements, is a considerable factor, as much less work is required behind the scenes for each iteration. Another factor is that the underlying iteration mechanism for list comprehensions is much closer to a C loop than execution of a for loop.

tskuzzy
  • 35,812
  • 14
  • 73
  • 140
  • 1
    Good explanation of why the latter is O(n^2). Also, here's some `timeit` timings I did on the list comprehension vs for loop approach (list comprehensions are about twice as fast in my example): https://gist.github.com/2647005 – Ben Hoyt May 09 '12 at 17:34
  • wow... that is some upsetting evidence. I'm implementing the first one for sure. – Jake May 09 '12 at 17:42
5

The main difference between the two options is that the one that uses set is asymptotically much more efficient.

Under reasonably favourable conditions, looking up an item in a set can be done in O(1) time; looking up an item in a list requires O(n) time.

The second, less significant, difference is that one version uses a list comprehension while the other uses a for loop. List comprehensions tend to produce more compact code. They also tend to be more efficient (although if performance is a concern, the only way to get an accurate picture is by running benchmarks).

NPE
  • 486,780
  • 108
  • 951
  • 1,012
2

List comprehension is generally regarded as more efficient than using regular list operations when creating a list. If memory is a concern, you may want to use a generator instead.

This provides a bit of information re performances of for-loops, map and list comprehension. @benhoyt also provided a useful link comparing loops vs list comprehension.

However, please note that if performance is a specific concern, it might be worth your while to time/benchmark your various options to select the best option for your specific requirements.

Levon
  • 138,105
  • 33
  • 200
  • 191
  • How much faster is a list comprehension in this instance? – Gabe May 09 '12 at 17:20
  • 2
    A list comprehension might be *slightly* faster than building a list with append, but not much. The real gain of sets over lists here is that `elem in container` is O(len(container)) if `container` is a list, but O(1) if `container` is a set. – Ben Hoyt May 09 '12 at 17:23
  • @Gabe I added a link to my post that talks about for-loops vs map vs list comprehensions that might be helpful. – Levon May 09 '12 at 17:30
2

Using the set is usually faster because it only has to iterate b once, while your example has to iterate b once for each element in a.

Gabe
  • 84,912
  • 12
  • 139
  • 238
2

I've done some tests:

test_lists.py

a = range(1, 1000)
b = range(2, 1002)

tmp = []
for i in a:
    if(i not in b):
        tmp.append(i)

test_set_list_comprehensions.py

a = range(1, 1000)
b = range(2, 1002)

b = set(b)
[aa for aa in a if aa not in b]

test_set.py

a = range(1, 1000)
b = range(2, 1002)
list(set(a).difference(set(b)))

And that's what timeit says:

~$ python -m timeit 'import test_lists'
1000000 loops, best of 3: 0.671 usec per loop
~$ python -m timeit 'import test_set_list_comprehension'
1000000 loops, best of 3: 0.766 usec per loop
~$ python -m timeit 'import test_set'
1000000 loops, best of 3: 0.656 usec per loop

So the best one seems to be:

test_set.py

a = range(1, 1000)
b = range(2, 1002)
list(set(a).difference(set(b)))
raulcumplido
  • 405
  • 1
  • 3
  • 6