Why is this slicing code faster than more procedural code?

Question

I have a Python function that takes a list and returns a generator yielding 2-tuples of each adjacent pair, e.g.

>>> list(pairs([1, 2, 3, 4]))
[(1, 2), (2, 3), (3, 4)]

I've considered an implementation using 2 slices:

def pairs(xs):
    for p in zip(xs[:-1], xs[1:]): 
        yield p

and one written in a more procedural style:

def pairs(xs):
    last = object()
    dummy = last
    for x in xs:
        if last is not dummy:
            yield last,x
        last = x

Testing using range(2 ** 15) as input yields the following times (you can find my testing code and output here):

2 slices: 100 loops, best of 3: 4.23 msec per loop
0 slices: 100 loops, best of 3: 5.68 msec per loop

Part of the performance hit for the sliceless implementation is the comparison in the loop (if last is not dummy). Removing that (making the output incorrect) improves its performance, but it's still slower than the zip-a-pair-of-slices implementation:

2 slices: 100 loops, best of 3: 4.48 msec per loop
0 slices: 100 loops, best of 3: 5.2 msec per loop

So, I'm stumped. Why is zipping together 2 slices, effectively iterating over the list twice in parallel, faster than iterating once, updating last and x as you go?

EDIT

Dan Lenski proposed a third implementation:

def pairs(xs):
    for ii in range(1,len(xs)):
        yield xs[ii-1], xs[ii]

Here's its comparison to the other implementations:

2 slices: 100 loops, best of 3: 4.37 msec per loop
0 slices: 100 loops, best of 3: 5.61 msec per loop
Lenski's: 100 loops, best of 3: 6.43 msec per loop

It's even slower! Which is baffling to me.

EDIT 2:

ssm suggested using itertools.izip instead of zip, and it's even faster than zip:

2 slices, izip: 100 loops, best of 3: 3.68 msec per loop

So, izip is the winner so far! But still for difficult-to inspect reasons.

if you `import dis` and use `dis.dis(pairs)` for function and comapare it may help. — Padraic Cunningham, Sep 17 '14 at 00:36
I think `zip` actually creates the entire list in memory before starting to do the yield. What does the performance of `itertools.izip` look like?That is much closer to your piece of procedural code — ssm, Sep 17 '14 at 00:54
The Python-level indexing of the `xs` list may be slowing things down; there's some range-checking involved. Other than that, I'm not sure why my version would be slower... drat! — Dan Lenski, Sep 17 '14 at 01:29

ssm · Answer 1 · 2014-09-17T01:22:59.827

2

This i the result for the iZip which is actually closer to your implementation. Looks like what you would expect. The zip version is creating the entire list in memory within the function so it is the fastest. The loop version just los through the list, so it is a little slower. The izipis the closest in resemblance to the code, but I am guessing there is some memory-management backend processes which increase the time of execution.

In [11]: %timeit pairsLoop([1,2,3,4,5])
1000000 loops, best of 3: 651 ns per loop

In [12]: %timeit pairsZip([1,2,3,4,5])
1000000 loops, best of 3: 637 ns per loop

In [13]: %timeit pairsIzip([1,2,3,4,5])
1000000 loops, best of 3: 655 ns per loop

The version of code is shown below as requested:

from itertools import izip


def pairsIzip(xs):
    for p in izip(xs[:-1], xs[1:]): 
        yield p

def pairsZip(xs):
    for p in zip(xs[:-1], xs[1:]): 
        yield p

def pairsLoop(xs):
    last = object()
    dummy = last
    for x in xs:
        if last is not dummy:
            yield last,x
        last = x

edited Sep 17 '14 at 01:22

answered Sep 17 '14 at 01:09

ssm

5,277
1
24
42

What is your `pairsLoop` version? I don't think there should be a significant difference between `zip` and `izip` versions as long as the entire list(s) can fit in memory easily. The likely culprit is that the OP's loop version is doing extra work in every loop, as [I wrote above](http://stackoverflow.com/a/25880602/20789). – Dan Lenski Sep 17 '14 at 01:20
1

I have updated the post to include everything. I didnt change any of the code from the OP. – ssm Sep 17 '14 at 01:23
You're running a somewhat different test from the OP: you're doing many (1,000,000) iterations with a short list (`range(1,6)`), while the OP is doing a few (100) iterations of a long list (`range(2**15)`). That might well make a difference... – Dan Lenski Sep 17 '14 at 01:26

score 2 · Accepted Answer · answered Sep 17 '14 at 02:00

Lots of interesting discussion elsewhere in this thread. Basically, we started out comparing two versions of this function, which I'm going to describe with the following dumb names:

The "zip-py" version:

def pairs(xs):
    for p in zip(xs[:-1], xs[1:]): 
        yield p

The "loopy" version:

def pairs(xs):
    last = object()
    dummy = last
    for x in xs:
        if last is not dummy:
            yield last,x
        last = x

So why does the loopy version turn out to be slower? Basically, I think it comes down to a couple things:

The loopy version explicitly does extra work: it compares two objects' identities (if last is not dummy: ...) on every pair-generating iteration of the inner loop.
- @mambocab's edit shows that not doing this comparison does make the loopy version
  slightly faster, but doesn't fully close the gap.
The zippy version does more stuff in compiled C code that the loopy version does in Python code:
- Combining two objects into a tuple. The loopy version does yield last,x, while in the zippy version the tuple p comes straight from zip, so it just does yield p.
- Binding variable names to object: the loopy version does this twice in every loop, assigning x in the for loop and last=x. The zippy version does this just once, in the for loop.
Interestingly, there is one way in which the zippy version is actually doing more work: it uses two listiterators, iter(xs[:-1]) and iter(xs[1:]), which get passed to zip. The loopy version only uses one listiterator (for x in xs).
- Creating a listiterator object (the output of iter([])) is likely a very highly optimized operation since Python programmers use it so frequently.
- Iterating over list slices, xs[:-1] and xs[1:], is a very lightweight operation which adds almost no overhead compared to iterating over the whole list. Essentially, it just means moving the starting or ending point of the iterator, but not changing what happens on each iteration.

Fun fact: when possible, the Python 3 iterator implementation of `zip` actually [reuses the tuple yielded at each step of iteration](https://hg.python.org/cpython/file/25ecf3d0ea03/Python/bltinmodule.c#l2304) to avoid allocating a new tuple each time. I was using Python 2 for the benchmarking in this question, so this tuple reuse isn't directly relevant to the answer. Still interesting though! — Jim Witschey, Jan 26 '15 at 05:50

score 1 · Answer 3 · answered Sep 17 '14 at 00:46

1

I suspect the main reason that the second version is slower is because it does a comparison operation for every single pair that it yields:

# pair-generating loop
for x in xs:
    if last is not dummy:
       yield last,x
    last = x

The first version does not do anything but spit out values. With the loop variables renamed, it's equivalent to this:

# pair-generating loop
for last,x in zip(xs[:-1], xs[1:]):
    yield last,x

It's not especially pretty or Pythonic, but you could write a procedural version without a comparison in the inner loop. How fast does this one run?

def pairs(xs):
    for ii in range(1,len(xs)):
        yield xs[ii-1], xs[ii]

answered Sep 17 '14 at 00:46

Dan Lenski

76,929
13
76
124

1

I don't think I communicated it very clearly, but I mentioned in my question that I tried it without the comparison (making the output incorrect). The timing for that is at the very bottom of my question -- it seems that the comparison is a good chunk of the runtime, but it's still slower than the slice-and-zip approach. – Jim Witschey Sep 17 '14 at 01:21
1

I can't format the output as I'd like in this comment, so I'm adding an assessment of your proposed implementation to the question itself. – Jim Witschey Sep 17 '14 at 01:22
Another "extra piece of work" that the loop version does, implicitly, is that it combines two separate objects into a `tuple` (`yield last,x`). The `zip` version does that in compiled C code rather than in Python... that may give a slight speedup as well. – Dan Lenski Sep 17 '14 at 01:23
Your comments kind of sum up to "well, `zip`/`izip` deal with the list code in C, and that's probably why they're faster". That's good enough for me. If you compile a single answer that points out the operations in the loop that are (probably) optimized away, I'll accept it. – Jim Witschey Sep 17 '14 at 01:44

Why is this slicing code faster than more procedural code?

3 Answers3

Linked