1

I found this question Generators vs List Comprehension performance in Python and instead of cProfile I use timeit.

from timeit import timeit
import cProfile

print timeit('sum([i for i in range(9999999)])', number=1)
print timeit('sum((i for i in range(9999999)))', number=1)

print cProfile.run('sum([i for i in xrange(9999999)])')
print cProfile.run('sum((i for i in xrange(9999999)))')

Result is

LC timeit 0.728941202164
G timeit 0.643975019455
LC cProfile          3 function calls in 0.751 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.673    0.673    0.751    0.751 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.078    0.078    0.078    0.078 {sum}


None
G cProfile          10000003 function calls in 1.644 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 10000000    0.843    0.000    0.843    0.000 <string>:1(<genexpr>)
        1    0.000    0.000    1.644    1.644 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.801    0.801    1.644    1.644 {sum}

I believe generator should better than list comprehension but why in this case the result is not clear. My question is which one is better to write

sum((i for i in list_of_i))   # Which use 1 loop

vs

sum([i for i in list_of_i])   # Which seem to took 2 loop: 1 for list create and one for sum
Community
  • 1
  • 1
James
  • 13,571
  • 6
  • 61
  • 83

3 Answers3

4

In the simple case, it will be fastest to do this without a comprehension/generator:

sum(xrange(9999999))

Normally, if I need to do some sort of operation where I need to choose between a comprehension and generator expression, I do:

sum(a*b for a, b in zip(c, d))

Personally, I think that the generator expression (without the extra parenthesis1) looks nicer and since readability counts -- This outweighs any micro performance differences between the two expressions.

Generators will frequently be faster for things like this because they avoid creating an intermediate list (and the memory allocation associated with it). The timing difference is probably more pronounced as the list gets bigger as the memory allocation and list resizing take more time for bigger lists. This isn't always the case however (It is well documented on StackOverflow that str.join works faster with lists than with generators in CPython because when str.join gets a generator, it constructs the list anyway...).

1You can omit the parenthesis any time you are passing a generator expression to a function as the only argument -- Which happens more frequently than you might expect...

mgilson
  • 300,191
  • 65
  • 633
  • 696
  • 2
    I think you mixed up comprehensions and genexps in a few places. – user2357112 Jun 04 '16 at 00:19
  • @user2357112 -- Long Friday I guess... Thanks. I suppose sometimes I've seen the term "comprehension" to mean "generator expression" even though it's not technically correct. I usually try to filter that out of any communications that I have, but apparently I don't always succeed :-( – mgilson Jun 04 '16 at 00:24
  • Building on this, I have a new question concerning generators vs list comprehension. You can find it [here](https://stackoverflow.com/questions/52138104/functions-where-the-total-iterable-is-needed-list-comprehension-vs-generator). – Bram Vanroy Sep 02 '18 at 14:52
1

Generators load lazily; you have to make a call to get their next value every time you want it.

sum is an aggregate function, which operations on the entire iterable. You have to have all of the values available for it to do its work.

The reason that the list comprehension works faster is that there's only one explicit call to get the entire list, and one explicit operation to sum them all. However, with the generator, you have to get all of the items for it to to perform its aggregation, and since there's a million of them, that results in a million calls.

This is one of those cases in which being eager is better for performance.

Makoto
  • 104,088
  • 27
  • 192
  • 230
  • 4
    Take a closer look; the generator actually won the `timeit` timing. – user2357112 Jun 04 '16 at 00:10
  • 1
    @user2357112: It didn't. I ran it locally and observed that the generator took the longest time out of them all. I admit that the way the question was formatted made that strange to read, which is why I ran it locally to verify the results for myself. – Makoto Jun 04 '16 at 00:11
  • 4
    I did the test myself and the generator expression won, as for the OP. – interjay Jun 04 '16 at 00:13
  • The generator consistently won in my tests on [PythonAnywhere's "Try IPython" thing](https://www.pythonanywhere.com/try-ipython/). Ideone timed out, but for slightly smaller inputs, the Ideone results were inconsistent, with the genexp usually doing slightly worse. – user2357112 Jun 04 '16 at 00:18
  • 4
    It could depend on the system you're running it. You know, CPU bottleneck VS memory bottleneck? – Karoly Horvath Jun 04 '16 at 00:19
0

The generator version won. The cProfile profiling simply introduced way more overhead for the genexp than the list comprehension, since it has a lot more points where the profiler butts in.

user2357112
  • 260,549
  • 28
  • 431
  • 505