3

Here is the code:

import itertools, time
x = {1:[1, 2], 2:[2, 3], 3:[3, 4]}
l = [1, 2, 3] * 10000000
start = time.time()
y = [x[i] for i in l]
y = list(itertools.chain.from_iterable(y))
print(y[:6])
print(time.time() - start)

start = time.time()
y = []
[y.extend(x[i]) for i in l]
print(y[:6])
print(time.time() - start)

The first approach is to allow the inner lists to be nested, then flatten them out when the comprehension is finished. The second approach is to construct a flat list during comprehension.

It seems the first approach is a bit faster:

3.8479559421539307
4.469805955886841

I am wondering if there are better (faster) ways to do it?

qed
  • 22,298
  • 21
  • 125
  • 196

2 Answers2

2

The timings are closer on my machine:

>>> x = {1: [1, 2], 2: [2, 3], 3: [3, 4]}
>>> l = [1, 2, 3] * 10000000
>>> import timeit
>>> import itertools

>>> def by_chaining():
...     y = [x[i] for i in l]
...     return list(itertools.chain.from_iterable(y))
...
>>> timeit.timeit(by_chaining, number=1)
5.491670315985385

>>> def by_extension():
...     y = []
...     [y.extend(x[i]) for i in l]
...     return y
...
>>> timeit.timeit(by_extension, number=1)
5.656423713273028

The thing is, though, the .extend approach is really meant to be used in an explicit loop, because it's a function with side effects. With the list comprehension, you're producing a separate list full of None values that is then thrown away, which adds quite a bit of overhead. See:

>>> def by_extension_loop():
...     y = []
...     for i in l: y.extend(x[i])
...     return y
...
>>> timeit.timeit(by_extension_loop, number=1)
4.62852763674681

Let's try some other approaches:

>>> def by_nested_comprehension():
...     return [e for i in l for e in x[i]]
...
>>> timeit.timeit(by_nested_comprehension, number=1)
5.102275806385393

Don't do this one. It gets the correct answer theoretically, but will take forever and uses excessive memory. (It's probably part of the reason why sum was explicitly special-cased to refuse to sum strings - although I kinda wish they'd redirected it to a more efficient technique instead).

>>> def by_sum():
...     return sum((x[i] for i in l), [])

How about giving from_iterable a generator to work with?

>>> def by_chaining_generator():
...     return list(itertools.chain.from_iterable(x[i] for i in l))
...
>>> timeit.timeit(by_chaining_generator, number=1)
5.420730297100135

How about using map instead of comprehensions?

>>> def by_chaining_map():
...     return list(itertools.chain.from_iterable(map(x.get, l)))
...
>>> timeit.timeit(by_chaining_map, number=1)
4.707590696974194

>>> def by_chaining_map_2():
...     return list(itertools.chain.from_iterable(map(x.__getitem__, l)))
...
>>> timeit.timeit(by_chaining_map_2, number=1)
4.576851915207953

Looks like that one might be a winner, but it's close. It's also possible to optimize the list.append version by pre-allocating and inserting slices, IIRC. But my creativity seems exhausted for the moment :)

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
1

The answer is actually hidden in your question. A much faster way to achieve what you want is:

l = [1, 2, 3]
x = {1:[1, 2], 2:[2, 3], 3:[3, 4]}

y = []
for k in l:
    y.extend(x[k])

y *= 10000000
Nasser Al-Shawwa
  • 3,573
  • 17
  • 27