6

I'm trying to understand the performance of a generator function. I've used cProfile and the pstats module to collect and inspect profiling data. The function in question is this:

def __iter__(self):
    delimiter  = None
    inData     = self.inData
    lenData    = len(inData)
    cursor     = 0
    while cursor < lenData:
        if delimiter:
            mo = self.stringEnd[delimiter].search(inData[cursor:])
        else:
            mo = self.patt.match(inData[cursor:])
        if mo:
            mo_lastgroup = mo.lastgroup
            mstart       = cursor
            mend         = mo.end()
            cursor       += mend
            delimiter = (yield (mo_lastgroup, mo.group(mo_lastgroup), mstart, mend))
        else:
            raise SyntaxError("Unable to tokenize text starting with: \"%s\"" % inData[cursor:cursor+200])

self.inData is a unicode text string, self.stringEnd is a dict with 4 simple regex's, self.patt is one big regex. The whole thing is to split the big string into smaller strings, one-by-one.

Profiling a program that uses it I found that the biggest part of the program's run time is spent in this function:

In [800]: st.print_stats("Scanner.py:124")

         463263 function calls (448688 primitive calls) in 13.091 CPU seconds

   Ordered by: cumulative time
   List reduced from 231 to 1 due to restriction <'Scanner.py:124'>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10835   11.465    0.001   11.534    0.001 Scanner.py:124(__iter__)

But looking at the profile of the function itself, there is not much time spent in the sub-calls of functions:

In [799]: st.print_callees("Scanner.py:124")
   Ordered by: cumulative time
   List reduced from 231 to 1 due to restriction <'Scanner.py:124'>

Function                  called...
                              ncalls  tottime  cumtime
Scanner.py:124(__iter__)  ->   10834    0.006    0.006  {built-in method end}
                               10834    0.009    0.009  {built-in method group}
                                8028    0.030    0.030  {built-in method match}
                                2806    0.025    0.025  {built-in method search}
                                   1    0.000    0.000  {len}

The rest of the function is not much besides while, assignments and if-else. Even the send method on the generator which I use is fast:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
13643/10835    0.007    0.000   11.552    0.001 {method 'send' of 'generator' objects}

Is it possible that the yield, passing a value back to the consumer, is taking the majority of the time?! Anything else that I'm not aware of?

EDIT:

I probably should have mentioned that the generator function __iter__ is a method of a small class, so self refers to an instance of this class.

ThomasH
  • 22,276
  • 13
  • 61
  • 62
  • 3
    How big is inData? Slicing it repeatedly might not very efficient. Maybe if you tried using islice in itertools instead. See if that makes any difference. – Dunes Jun 08 '11 at 20:45
  • @Dunes Thanks, will try. The performance data was taken on a string of roughly 1MB. - If you put this in an answer, I could upvote it. – ThomasH Jun 08 '11 at 21:01
  • Have you tried [this](http://stackoverflow.com/questions/4295799/how-to-improve-performance-of-this-code/4299378#4299378)? – Mike Dunlavey Jun 09 '11 at 04:13
  • @Mike No, but interesting approach, thanks for the hint. I was following up on Dunes' comment, and it seems that slicing of large strings (and into large slices) is really a time hog. I'm trying to find a way around that. Following your approach I would probably need to break up some composed statements, so I could see that on the stack traces. – ThomasH Jun 09 '11 at 07:51
  • @Dunes There is indeed a performance hit slicing a big string (esp. when the slice itself is big). But it seems that ``islice`` objects cannot be passed as the "string" argument to regex functions - which is what I need. – ThomasH Jun 09 '11 at 10:01

2 Answers2

2

This is actually the answer of Dunes, who unfortunately only gave it as a comment and doesn't seem to be inclined to put it in a proper answer.

The main performance culprit were the string slices. Some timing measurements showed that slicing performance degrades perceivably with big slices (meaning taking a big slice from an already big string). To work around that I now use the pos parameter for the regex object methods:

    if delimiter:
        mo = self.stringEnd[delimiter].search(inData, pos=cursor)
    else:
        mo = self.patt.match(inData, pos=cursor)

Thanks to all who helped.

Community
  • 1
  • 1
ThomasH
  • 22,276
  • 13
  • 61
  • 62
  • 1
    Ah, sorry. I was very busy at work the past few days. I only figured out the problem, as my solution wasn't adequate. So credit to you for finding the solution. – Dunes Jun 11 '11 at 13:24
  • @Dunes I was asking for the problem, so your comment was quite adequate. Next time :-) . – ThomasH Jun 11 '11 at 19:53
1

If reading your sample correctly, you are taking a generator object, putting it into delimiter, and using it for an array lookup. That may not be your speed issue, but I'm pretty sure that's a bug.

Nick ODell
  • 15,465
  • 3
  • 32
  • 66
  • 1
    If you refer to the `delimiter = (yield ...)` part, no. This function is a **coroutine**, which allows the user to do `co.send(x)`, which resumes execution (like `next(generator)`) and makes `(yield ...)` evaluate to `x` (if you simply use it as an iterable, it evaluates to `None` IIRC). –  Jun 08 '11 at 18:28
  • Yes, as delnan wrote, I'm passing in short strings to the generator sometimes (using .send from the outside), to make it switch to using a different regex for the next chunk. – ThomasH Jun 08 '11 at 18:54