2

I have a generator that generates a finite sequence. To determine the length of this sequence I tried these two approaches:

 seq_len = sum([1 for _ in euler14_seq(sv)])  # list comp

and

 seq_len = sum(1 for _ in euler14_seq(sv))    # generator expression

sv is a constant starting value for the sequence.

I had expected that list comprehension would be slower and the generator expression faster, but it turns out the other way around.

I assume the first one will be much more memory intensive since it creates a complete list in memory first - part of the reason I also thought it would be slower.

My question: Is this observation generalizable? And is this due to having two generators involved in the second statement vs the first?

I've looked at these What's the shortest way to count the number of items in a generator/iterator?, Length of generator output, and Is there any built-in way to get the length of an iterable in python? and saw some other approaches to measuring the length of a sequence, but I'm specifically curious about the comparison of list comp vs generator expression.

PS: This came up when I decided to solve Euler Project #14 based on a question asked on SO yesterday.

(By the way, what's the general feeling regarding use of the '_' in places where variable values are not needed).

This was done with Python 2.7.2 (32-bit) under Windows 7 64-bit

Community
  • 1
  • 1
Levon
  • 138,105
  • 33
  • 200
  • 191
  • Using '_' is the norm for unwanted vars - it's the common and accepted practice - but I'm afraid the rest I can't help you with. – Jon Clements Jul 05 '12 at 23:32
  • 1
    There is a fair bit more work setting up the generator. Once the sequence is long enough the generator expression should be faster – John La Rooy Jul 05 '12 at 23:34
  • What about `len(list(euler14_seq(sv)))`? Also, if you're testing performance like this you should say what platform you are testing on. In particular Python version and OS and possibly hardware information. – Peter Graham Jul 05 '12 at 23:34
  • I think that in short, generators introduce a `__call__` overhead (which can be slow in Python) and a listcomp does not, but that's a gut feeling I can't back up with references... – Jon Clements Jul 05 '12 at 23:36
  • ... and also that a listcomp's statement is more optimisable as the code executed is static, while a generator can bubble around and pursue other things at will – Jon Clements Jul 05 '12 at 23:39
  • @PeterGraham I added some more information about Python version/OS to my post - thanks for suggesting that. I could try `len()` on *both* expressions, but I would expect the result to not change. Also, by applying `list()` to the generator aren't you mostly eliminating the comparison of generator vs list comp? – Levon Jul 05 '12 at 23:51
  • @Levon: I was suggesting another way to find the length of a generator. It's probably not relevant to whether a list comprehension or generator expression is faster. The difference is that once you have a list `len` is very fast and O(1) while `sum` is likely far slower and O(n) – Peter Graham Jul 06 '12 at 00:01

2 Answers2

5

On this computer, the generator expression becomes faster somewhere between 100,000 and 1,000,000

$ python -m timeit "sum(1 for x in xrange(100000))"
10 loops, best of 3: 34.8 msec per loop
$ python -m timeit "sum([1 for x in xrange(100000)])"
10 loops, best of 3: 20.8 msec per loop
$ python -m timeit "sum(1 for x in xrange(1000000))"
10 loops, best of 3: 315 msec per loop
$ python -m timeit "sum([1 for x in xrange(1000000)])"
10 loops, best of 3: 469 msec per loop
Peter Wood
  • 23,859
  • 5
  • 60
  • 99
John La Rooy
  • 295,403
  • 53
  • 369
  • 502
  • I cranked up the value of `sv` to 1000000000, still list comp is faster (25 us/loop vs 30 us/loop) .. – Levon Jul 05 '12 at 23:43
  • 1
    @Levon, I think the main hit to the LC is the memory allocation. Your's is a lot faster than mine. For a busy web server the tradeoff can become tricky as storing those temporary lists in your precious RAM can be more expensive than the extra cycles used by the generator – John La Rooy Jul 05 '12 at 23:49
  • Perhaps, I do have a lot of RAM available. And your point in the comments re setup for the generator is a good one too. Still a bit puzzling. I'll have to play around with this some more. Thanks. – Levon Jul 06 '12 at 00:02
3

The following code block should generate the length:

>>> gen1 = (x for x in range(10))
>>> len(list(gen1))
10
Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
  • I didn't want to mix up `len()` vs `sum()` in order to keep the comparison true. And the basic gist of my question is to figure out if "list comp" will always be faster than "generator expressions" in general. – Levon Jul 05 '12 at 23:32
  • @Levon take a look at this : http://stackoverflow.com/questions/47789/generator-expressions-vs-list-comprehension – Ashwini Chaudhary Jul 05 '12 at 23:37
  • Thanks .. I'll take a close look at that (just quickly skimmed it) – Levon Jul 06 '12 at 00:00