Converting while to generator 3.4 times slow down

Question

What is happening? Can somebody explain me what happens here, I changed in tight loop:

##            j=i
##            while j < ls - 1 and len(wordlist[j]) > lc: j+=1
            j = next(j for j in range(i,ls) if len(wordlist[j]) <=  lc)

The commented while version ran the whole program: 625 ms, the next generator version ran the whole program in time of 2.125 s.

What can be reason that this more pythonic version cause such a catastroph in performance?

EDIT: Maybe it is caused by use of psyco module? Surely at least the running time with Python 2.7 which has not psyco, was 2.141 for the next version, means almost same as Python 2.6 with psyco.

After removing the *.pyc files I got notthe code to slow down. Then when I removed import of psyco from library module also, I got 2.6 timing also for use without psyco, results for the non psyco version and psyco version (as now the library routine slows down also and it's timing is also relevant:)

not psyco:

while: preparation in library: 532 ms, total running time 2.625 s
next: preparation in library: 532 ms, total running time (time.clock()): 2.844 s (version with xrange same wall time)

psyco:

while: preparation in library: 297 ms, total running time : 609..675 ms
next: preparation in library: 297 ms, total running time: 1.922 s (version with range instead of xrange everywhere in program: 1.985 s)

Running in WindowsXP AMD Sempron 3100+ system with 2GB RAM. Counting the loops and calls with two globals:

    j=i
    callcount += 1
    while j < ls - 1 and len(wordlist[j]) > lc:
        j+=1
        loopcount += 1

Result for the test input with psyco:

Finished in 625 ms
Loopcount: 78317
Callcount: 47970
Ration: 1.633

So the loop is inside tight loop, but is on average executed only couple of times (notice that two increments of global counters did not slow down the code in psyco)

CONCLUSIONS: Despite the highly sensitive nature of the algorithm relative to vocabulary length, which caused me to pass some imposible words from consideration by this loop, later the basic cases of recursion are checked by dictionary lookup which is O(n), therefore the highly beneficial earlier optimization is become not very beneficial, even with longer input and moving the callcount counter in beginning of the function, showed that call count is not affected by the vocabulary length, but outer loop count is slichtly reduced (the code originally posted is in elif part of if statement).

Longer run times (29 372 solutions) with while loop and the whole loop removed (using i instead of j) (library preparation 312 ms):

Without the loop: elif branch count: 485488, outerloopcount: 10129147, ratio: 0,048, runtime 6,000 s (without counters: 4,594 s)
With the loop: loopcount: 19355114, outercount: 8194033, ratio: 0,236, runtime 5,704 s (without counters: 4,688 s)

(running time without loop, counters and psyco: 32,792 s, library 608 ms)

So without the extra counters the benefit of this loop using psyco is in the harder case: (4688-4594)*100/4688.0 % = 2 %

This inspired me to reverse another earlier optimization, which I had wondered about in DaniWeb. Earlier version of code run faster, when the smallest word size was global, not paramerter. According to documentation, local variable calls are faster, but apparantly the cost in making recursion heavier outweighted that. Now in the harder case this other reversal of optimization brought more expected performance behaviour in the case of no optimization of word lenght: the run time with psyco was 312 ms preparations, 4,469..4,484 s total running time. So this made code cleaner and brought more benefit in this case as the removed loop had. And putting the parameter to version with while loop, did not change the running time much (the variation became greater for library preparation code)

**What I learned from this: If you do n optimizations for speed 
you must check the first n-1 optimizations after doing nth one**

If you really want to compare, you should probably be using `xrange()`. — Amber, Oct 10 '10 at 22:47
the second version doesn't really seem to be more pythonic to me. — Winston Ewert, Oct 10 '10 at 23:04
I haven't been able to reproduce what you're seeing, although I have to guess a lot about what `ls`, `lc` and `wordlist` are and how often you run the code. Even so, it's entirely likely that Amber's suggestion of `xrange()` fixes it. (Amber should post it as an answer.) If it doesn't, you should provide more information about how the code is run. — Thomas Wouters, Oct 11 '10 at 00:17
As per musicfreak's comment to my answer, could you please benchmark this with Psyco disabled? Also, how meaty is the code inside your loop (*i.e.* how many iterations are we talking here)? JIT compilation will tend to improve the performance as the number of iterations increases. — user359996, Oct 11 '10 at 00:46
My speed tests did not show noticeable speed difference as the ranges short but the algorithm is highly non-linear by input size (the input itself and the vocabulary it enables) — Tony Veijalainen, Oct 11 '10 at 16:10

Scott Griffiths · Answer 1 · 2010-10-11T15:16:17.647

4

I've found that using generators can often be slower than generating the whole list, which is a little counter-intuitive. I've managed to fix performance bottlenecks just by adding a [] pair.

For example compare these:

$ python -m timeit -n 1000 "' '.join(c for c in 'hello world')"
1000 loops, best of 3: 6.11 usec per loop
$ python -m timeit -n 1000 "' '.join([c for c in 'hello world'])"
1000 loops, best of 3: 3.79 usec per loop

It's almost twice as quick to generate the whole list first rather than use a generator even for such a simple case!

Edit: As Thomas Wouters points out in the comments the reason the generator is slower here is because it's such a simple case. For balance here is his test in which the generator is the clear winner:

$ python -m timeit -s "s = 'hello world' * 10000" -s "class C: pass" "for i in (C() for c in s): pass"
10 loops, best of 3: 33.6 msec per loop
$ python -m timeit -s "s = 'hello world' * 10000" -s "class C: pass" "for i in [C() for c in s]: pass"
10 loops, best of 3: 172 msec per loop

edited Oct 11 '10 at 15:16

answered Oct 11 '10 at 08:15

Scott Griffiths

21,438
8
55
85

2

Yes, a generator has to do a tiny bit of extra work for each element than creating a list and then iterating would. However, whether that is enough to notice depends greatly on how well the full list fits in memory (which is not easy to see just by looking at the code.) In your example, the list is tiny, creating the full list will be fast, and you're really only measuring the speed of the iteration (you don't spend time anywhere else.) Try it with, say, `python -m timeit -s "s = 'hello world' * 10000" "' '.join(c for c in s)` instead and you'll see the generator can be quite faster. – Thomas Wouters Oct 11 '10 at 11:02
@Thomas: Good points, but I still get the generator being slower for your example (11 ms vs. 8 ms), and increasing the string length further doesn't change this. – Scott Griffiths Oct 11 '10 at 11:12
Although you may want to change the loop to `c for c in s if 0` to reduce the noise of creating the result string :) – Thomas Wouters Oct 11 '10 at 11:13
Yeah, I forgot about the optimizations involved here, especially string interning. The difference won't be easily noticeable with this minimal amount of work; you need to grow the list *itself* to beyond what fits into memory, the strings don't take up any extra memory. Using something other than a string may also show a better difference. – Thomas Wouters Oct 11 '10 at 11:16
Here's a version that shows the difference when not just caching strings: http://paste.pocoo.org/show/273935/ – Thomas Wouters Oct 11 '10 at 11:20
The reason a list is faster with `str.join` is because the algorithm needs to go through the data twice (once for memory allocation and second time for copying), as explained in [this answer](https://stackoverflow.com/a/26635939/3767239). – a_guest Apr 24 '19 at 13:58

Dave Kirby · Answer 2 · 2010-10-10T22:58:12.410

0

The two are not equivalent.

j=i
while j < ls - 1 and len(wordlist[j]) > lc: 
    j+=1

will stop the while loop as soon as wordlist[j] <= lc. It could conceivably go through the loop zero times if the first word in the list is shorter than or equal to lc.

j = next(j for j in range(i,ls) if len(wordlist[j]) <=  lc)

will continue iterating through the whole range i to ls, regardless of the length of the words in the list.

Edit: Ignore the above - as Amber pointed out, the call to next() means that the generator expression is only evaluated up until the first result is returned. In that case I suspect the time difference comes from using range() instead of xrange() (unless this is Python 3.x). In Python 2.x range() will create the full list in memory, even if the generator expression only returns the first value.

edited Oct 10 '10 at 22:58

answered Oct 10 '10 at 22:47

Dave Kirby

25,806
5
67
84

3

Not actually true. Generators are lazily-evaluated, and thus calling `next()` will only grab the first element in the result, which means that the generator won't evaluate anything beyond where the `if` condition is true. – Amber Oct 10 '10 at 22:48
2

@Amber: damn, you are right. I completely overlooked the call the next(). – Dave Kirby Oct 10 '10 at 22:53

Converting while to generator 3.4 times slow down

2 Answers2