The timings are closer on my machine:
>>> x = {1: [1, 2], 2: [2, 3], 3: [3, 4]}
>>> l = [1, 2, 3] * 10000000
>>> import timeit
>>> import itertools
>>> def by_chaining():
... y = [x[i] for i in l]
... return list(itertools.chain.from_iterable(y))
...
>>> timeit.timeit(by_chaining, number=1)
5.491670315985385
>>> def by_extension():
... y = []
... [y.extend(x[i]) for i in l]
... return y
...
>>> timeit.timeit(by_extension, number=1)
5.656423713273028
The thing is, though, the .extend
approach is really meant to be used in an explicit loop, because it's a function with side effects. With the list comprehension, you're producing a separate list full of None
values that is then thrown away, which adds quite a bit of overhead. See:
>>> def by_extension_loop():
... y = []
... for i in l: y.extend(x[i])
... return y
...
>>> timeit.timeit(by_extension_loop, number=1)
4.62852763674681
Let's try some other approaches:
>>> def by_nested_comprehension():
... return [e for i in l for e in x[i]]
...
>>> timeit.timeit(by_nested_comprehension, number=1)
5.102275806385393
Don't do this one. It gets the correct answer theoretically, but will take forever and uses excessive memory. (It's probably part of the reason why sum
was explicitly special-cased to refuse to sum strings - although I kinda wish they'd redirected it to a more efficient technique instead).
>>> def by_sum():
... return sum((x[i] for i in l), [])
How about giving from_iterable
a generator to work with?
>>> def by_chaining_generator():
... return list(itertools.chain.from_iterable(x[i] for i in l))
...
>>> timeit.timeit(by_chaining_generator, number=1)
5.420730297100135
How about using map
instead of comprehensions?
>>> def by_chaining_map():
... return list(itertools.chain.from_iterable(map(x.get, l)))
...
>>> timeit.timeit(by_chaining_map, number=1)
4.707590696974194
>>> def by_chaining_map_2():
... return list(itertools.chain.from_iterable(map(x.__getitem__, l)))
...
>>> timeit.timeit(by_chaining_map_2, number=1)
4.576851915207953
Looks like that one might be a winner, but it's close. It's also possible to optimize the list.append
version by pre-allocating and inserting slices, IIRC. But my creativity seems exhausted for the moment :)