1

I recently wrote a quick and dirty BFS implementation, to find diamonds in a directed graph. The BFS loop looked like this:

while toVisit:
    y = toVisit.pop()
    if y in visited: return "Found diamond"
    visited.add(y)
    toVisit.extend(G[y])

(G is the graph - a dictionary from node names to the lists of their neighbors)

Then comes the interesting part: I thought that list.pop() is probably too slow, so I ran a profiler to compare the speed of this implementation with deque.pop - and got a bit of an improvement. Then I compared it with y = toVisit[0]; toVisit = toVisit[1:], and to my surprise, the last implementation is actually the fastest one.

Does this make any sense? Is there any performance reason to ever use list.pop() instead of the apparently much faster two-liner?

Russia Must Remove Putin
  • 374,368
  • 89
  • 403
  • 331
Guy Adini
  • 5,188
  • 5
  • 32
  • 34
  • 2
    This doesn't make sense. Could you share your micro-benchmark results in more detail? – Eli Bendersky May 07 '12 at 06:51
  • 5
    Also, these are not equivalent. `pop()` takes from the back, your two liner takes from the front. – Eric W. May 07 '12 at 06:52
  • 5
    `toVisit.pop()` does not do the same thing as `y = toVisit[0]; toVisit = toVisit[1:]`: `pop()`'s default index is -1 (pop last), not 0 (pop first). – AKX May 07 '12 at 06:52
  • Good point. But the same repeats with list.pop(0). Might be worth noting that I'm using Python 2.7. – Guy Adini May 07 '12 at 06:56
  • @Eli: I profile with ipython's run -p. Is that output clear enough? – Guy Adini May 07 '12 at 06:56
  • 2
    `pop(0)` is O(N) as all the items must be shifted. `pop()` however, is known to be a very fast operation as it removes an item from the end of the list, requiring no shifting. – jamylak May 07 '12 at 06:59
  • 1
    Guy: I ran simple profiling and `pop` is *way* faster – Eli Bendersky May 07 '12 at 07:02
  • On a side note, I don't think your loop detection algorithm works. For example, in a diamond shaped graph like A->B, A->C, B->D, C->D, your algorithm will wrongly detect a cycle. Use DFS instead. See http://stackoverflow.com/questions/2525282/how-to-detect-if-a-directed-graph-is-cyclic – spinlok May 07 '12 at 07:05
  • @spinlok - Thanks. I know that, but I was meaning to look for "non DAGs". I edited the question accordingly. – Guy Adini May 07 '12 at 10:30
  • -1 because, given the answer, the question title is misleading. – Evgeni Sergeev Jan 18 '14 at 06:17
  • @EvgeniSergeev - I agree. But the traffic generated around it is interesting, so I'm not deleting it. – Guy Adini Jan 22 '14 at 19:15

2 Answers2

19

You have measured wrong. With cPython 2.7 on x64, I get the following results:

$ python -m timeit 'l = list(range(10000))' 'while l: l = l[1:]'
10 loops, best of 3: 365 msec per loop
$ python -m timeit 'l = list(range(10000))' 'while l: l.pop()'
1000 loops, best of 3: 1.82 msec per loop
$ python -m timeit 'import collections' \
         'l = collections.deque(list(range(10000)))' 'while l: l.pop()'
1000 loops, best of 3: 1.67 msec per loop
phihag
  • 278,196
  • 72
  • 453
  • 469
  • Same on my machine. I guess this means that my question is about ipython's profiling... – Guy Adini May 07 '12 at 07:02
  • `l = list(range(10000))` takes a significant time to run. You can run `python -m timeit 'l = list(range(10000))'` and subtract that time from the combined time – John La Rooy May 07 '12 at 07:15
  • 2
    @gnibbler The proper way is really to use `timeit`'s "setup" feature properly, like phihag did. – Amber May 07 '12 at 07:18
  • 1
    @Amber, you can't use the setup feature (`-s`) here as the list would be empty after the first run. – John La Rooy May 07 '12 at 07:23
  • 1
    Just a small point but that answer is technically measuring different things. ```pop``` without parameters defaults to **last** element. But the task it to remove **first** element. The answer is still ```pop``` is faster, but the numbers are different. (https://docs.python.org/2/library/stdtypes.html) – Alex Feb 26 '16 at 10:18
  • and deque should be popleft() not pop() – Alex Feb 26 '16 at 10:26
  • Technically true but if you're only doing `pop(0)` you could reverse the list and achieve the same results in `pop()` time. Since you're destroying the list it shouldn't matter if you modify it beforehand – Chris Rudd Sep 09 '22 at 00:32
0

Use generators for perfomance

python -m timeit 'import itertools' 'l=iter(xrange(10000))' 'while next(l, None): l,a = itertools.tee(l)'

1000000 loops, best of 3: 0.986 usec per loop

  • How is that an answer? And posting the benchmark results without comparison doesn't mean much. – Dirk Jan 19 '14 at 12:10
  • That's an addition for phihag's answer. in all(his and mine) examples l is the rest of the list on each iteration. time is mesuared equivalently. his code creates kind of copy of a list on each iteration, mine doesn't. – user3212026 Mar 22 '14 at 16:15