I have a Python function that takes a list and returns a generator yielding 2-tuples of each adjacent pair, e.g.
>>> list(pairs([1, 2, 3, 4]))
[(1, 2), (2, 3), (3, 4)]
I've considered an implementation using 2 slices:
def pairs(xs):
for p in zip(xs[:-1], xs[1:]):
yield p
and one written in a more procedural style:
def pairs(xs):
last = object()
dummy = last
for x in xs:
if last is not dummy:
yield last,x
last = x
Testing using range(2 ** 15)
as input yields the following times (you can find my testing code and output here):
2 slices: 100 loops, best of 3: 4.23 msec per loop
0 slices: 100 loops, best of 3: 5.68 msec per loop
Part of the performance hit for the sliceless implementation is the comparison in the loop (if last is not dummy
). Removing that (making the output incorrect) improves its performance, but it's still slower than the zip-a-pair-of-slices implementation:
2 slices: 100 loops, best of 3: 4.48 msec per loop
0 slices: 100 loops, best of 3: 5.2 msec per loop
So, I'm stumped. Why is zipping together 2 slices, effectively iterating over the list twice in parallel, faster than iterating once, updating last
and x
as you go?
EDIT
Dan Lenski proposed a third implementation:
def pairs(xs):
for ii in range(1,len(xs)):
yield xs[ii-1], xs[ii]
Here's its comparison to the other implementations:
2 slices: 100 loops, best of 3: 4.37 msec per loop
0 slices: 100 loops, best of 3: 5.61 msec per loop
Lenski's: 100 loops, best of 3: 6.43 msec per loop
It's even slower! Which is baffling to me.
EDIT 2:
ssm suggested using itertools.izip
instead of zip
, and it's even faster than zip
:
2 slices, izip: 100 loops, best of 3: 3.68 msec per loop
So, izip
is the winner so far! But still for difficult-to inspect reasons.