What makes python's itertools.groupby so fast?

Question

I am evaluating successive terms of the look and say sequence (more about it here: https://en.wikipedia.org/wiki/Look-and-say_sequence).

I used two approaches and timed them both. In the first approach, I just iterated over each term to construct the next one. In the second, I used itertools.groupby() to do the same. The time difference is pretty dramatic, so I made a figure for fun:

What makes itertools.groupby() so efficient? The code for the two approaches is below:

1st approach:

def find_block(seq):
    block = [seq[0]]
    seq.pop(0)

    while seq and seq[0] == block[0]:
        block.append(seq[0])
        seq.pop(0)

    return block

old = list('1113222113')
new = []
version1_time = []

for i in range(40):
    start = time.time()
    while old:
        block = find_block(old)
        new.append(str(len(block)))
        new.append(block[0])
    old, new = new, []
    end = time.time()
    version1_time.append(end-start)

2nd approach:

seq = '1113222113'
version2_time = []

def lookandread(seq):
    new = ''
    for value, group in itertools.groupby(seq):
        new += '{}{}'.format(len(list(group)), value)
    return new

for i in range(40):
    start = time.time()
    seq = lookandread(seq)
    end = time.time()
    version2_time.append(end-start)

As a side note, you should not benchmark manually like this; use [`timeit`](https://docs.python.org/3/library/timeit.html). The docs for `time` even explain why. It may not be relevant here (although there are _some_ questions where the whole answer turns out to be "your benchmarks are wrong", it's certainly not always the case), but it's better to do the right thing than to do the wrong thing and hope it doesn't matter this time. — abarnert, Jul 04 '18 at 23:18
Your first approach is written in a way that incredibly waste resources. It creates lists and pops elements all the time!! Maybe rewrite it using a set and generator functions, it will be closer to what `groupby()` does — nosklo, Jul 04 '18 at 23:19
Please read `groupby`'s implementation-equivalent python code: https://docs.python.org/3/library/itertools.html#itertools.groupby , which costs just O(n), while your implementation costs O(n^2). — blhsing, Jul 04 '18 at 23:26
@PaulRooney That isn't really relevant here. Implementing something in C can't change quadratic time into constant time; all it can do is reduce the constant factor. I'd be willing to be that if you just If you copy and paste the pure-Python "rough equivalent" `groupby` out of the `itertools` docs, it's somewhere between 1% and 1000% slower, but it retains the constant growth of the C implementation. — abarnert, Jul 04 '18 at 23:27
@PaulRooney The reason they're both N^2 faster than the OP's manual solution is that the OP does work per iteration that's quadratic on the size (popping N elements off the front of a list one by one, etc.), as nosklo explained, while `groupby` does constant work per iteration, so one is O(N^2M) while the other is just O(M). — abarnert, Jul 04 '18 at 23:29
Thanks for all the answers. I changed my code and dropped all instances of `list.pop()` and now it runs pretty fast too. @abarnert, after your suggestion I read about the `timeit` module and the problems it solves (quantization errors, other system processes), and I would like to use it. However, my problem is that `timeit` does not return the values computed while running the code, and I need each term of the sequence to generate the next one. Is there a way around this? — titiree, Jul 05 '18 at 21:48
Take a look at [gojomo](https://stackoverflow.com/users/130288/gojomo)' response as edited by [Nicholas Riley](https://stackoverflow.com/users/6372/nicholas-riley) to the StackOverflow question [How to measure elapsed time in Python?](https://stackoverflow.com/questions/7370801/how-to-measure-elapsed-time-in-python).. This response provides a mechanism using a context manager which would allow passing parameters of the function. — itprorh66, Jun 21 '21 at 20:00

What makes python's itertools.groupby so fast?

0 Answers0