Generator Function Performance

Question

I'm trying to understand the performance of a generator function. I've used cProfile and the pstats module to collect and inspect profiling data. The function in question is this:

def __iter__(self):
    delimiter  = None
    inData     = self.inData
    lenData    = len(inData)
    cursor     = 0
    while cursor < lenData:
        if delimiter:
            mo = self.stringEnd[delimiter].search(inData[cursor:])
        else:
            mo = self.patt.match(inData[cursor:])
        if mo:
            mo_lastgroup = mo.lastgroup
            mstart       = cursor
            mend         = mo.end()
            cursor       += mend
            delimiter = (yield (mo_lastgroup, mo.group(mo_lastgroup), mstart, mend))
        else:
            raise SyntaxError("Unable to tokenize text starting with: \"%s\"" % inData[cursor:cursor+200])

self.inData is a unicode text string, self.stringEnd is a dict with 4 simple regex's, self.patt is one big regex. The whole thing is to split the big string into smaller strings, one-by-one.

Profiling a program that uses it I found that the biggest part of the program's run time is spent in this function:

In [800]: st.print_stats("Scanner.py:124")

         463263 function calls (448688 primitive calls) in 13.091 CPU seconds

   Ordered by: cumulative time
   List reduced from 231 to 1 due to restriction <'Scanner.py:124'>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10835   11.465    0.001   11.534    0.001 Scanner.py:124(__iter__)

But looking at the profile of the function itself, there is not much time spent in the sub-calls of functions:

In [799]: st.print_callees("Scanner.py:124")
   Ordered by: cumulative time
   List reduced from 231 to 1 due to restriction <'Scanner.py:124'>

Function                  called...
                              ncalls  tottime  cumtime
Scanner.py:124(__iter__)  ->   10834    0.006    0.006  {built-in method end}
                               10834    0.009    0.009  {built-in method group}
                                8028    0.030    0.030  {built-in method match}
                                2806    0.025    0.025  {built-in method search}
                                   1    0.000    0.000  {len}

The rest of the function is not much besides while, assignments and if-else. Even the send method on the generator which I use is fast:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
13643/10835    0.007    0.000   11.552    0.001 {method 'send' of 'generator' objects}

Is it possible that the yield, passing a value back to the consumer, is taking the majority of the time?! Anything else that I'm not aware of?

EDIT:

I probably should have mentioned that the generator function __iter__ is a method of a small class, so self refers to an instance of this class.

How big is inData? Slicing it repeatedly might not very efficient. Maybe if you tried using islice in itertools instead. See if that makes any difference. — Dunes, Jun 08 '11 at 20:45
@Dunes Thanks, will try. The performance data was taken on a string of roughly 1MB. - If you put this in an answer, I could upvote it. — ThomasH, Jun 08 '11 at 21:01
Have you tried [this](http://stackoverflow.com/questions/4295799/how-to-improve-performance-of-this-code/4299378#4299378)? — Mike Dunlavey, Jun 09 '11 at 04:13
@Mike No, but interesting approach, thanks for the hint. I was following up on Dunes' comment, and it seems that slicing of large strings (and into large slices) is really a time hog. I'm trying to find a way around that. Following your approach I would probably need to break up some composed statements, so I could see that on the stack traces. — ThomasH, Jun 09 '11 at 07:51
@Dunes There is indeed a performance hit slicing a big string (esp. when the slice itself is big). But it seems that ``islice`` objects cannot be passed as the "string" argument to regex functions - which is what I need. — ThomasH, Jun 09 '11 at 10:01

score 2 · Accepted Answer · edited May 23 '17 at 12:12

2

This is actually the answer of Dunes, who unfortunately only gave it as a comment and doesn't seem to be inclined to put it in a proper answer.

The main performance culprit were the string slices. Some timing measurements showed that slicing performance degrades perceivably with big slices (meaning taking a big slice from an already big string). To work around that I now use the pos parameter for the regex object methods:

    if delimiter:
        mo = self.stringEnd[delimiter].search(inData, pos=cursor)
    else:
        mo = self.patt.match(inData, pos=cursor)

Thanks to all who helped.

edited May 23 '17 at 12:12

Community

1
1

answered Jun 09 '11 at 11:25

ThomasH

22,276
13
61
62

1

Ah, sorry. I was very busy at work the past few days. I only figured out the problem, as my solution wasn't adequate. So credit to you for finding the solution. – Dunes Jun 11 '11 at 13:24
@Dunes I was asking for the problem, so your comment was quite adequate. Next time :-) . – ThomasH Jun 11 '11 at 19:53

score 1 · Answer 2 · answered Jun 08 '11 at 18:17

1

If reading your sample correctly, you are taking a generator object, putting it into delimiter, and using it for an array lookup. That may not be your speed issue, but I'm pretty sure that's a bug.

answered Jun 08 '11 at 18:17

Nick ODell

15,465
3
32
66

1

If you refer to the `delimiter = (yield ...)` part, no. This function is a **coroutine**, which allows the user to do `co.send(x)`, which resumes execution (like `next(generator)`) and makes `(yield ...)` evaluate to `x` (if you simply use it as an iterable, it evaluates to `None` IIRC). – Jun 08 '11 at 18:28
Yes, as delnan wrote, I'm passing in short strings to the generator sometimes (using .send from the outside), to make it switch to using a different regex for the next chunk. – ThomasH Jun 08 '11 at 18:54

Generator Function Performance

2 Answers2