Faster Python List Comprehension

Question

I have a bit of code that runs many thousands of times in my project:

def resample(freq, data):
    output = []
    for i, elem in enumerate(freq):
        for _ in range(elem):
            output.append(data[i])
    return output

eg. resample([1,2,3], ['a', 'b', 'c']) => ['a', 'b', 'b', 'c', 'c', 'c']

I want to speed this up as much as possible. It seems like a list comprehension could be faster. I have tried:

def resample(freq, data):
   return [item for sublist in [[data[i]]*elem for i, elem in enumerate(frequencies)] for item in sublist]

Which is hideous and also slow because it builds the list and then flattens it. Is there a way to do this with one line list comprehension that is fast? Or maybe something with numpy?

Thanks in advance!

edit: Answer does not necessarily need to eliminate the nested loops, fastest code is the best

List comprehensions are not faster than the equivalent for loops, because they do exactly the same operations. — Daniel Roseman, Jun 29 '18 at 16:19
Just use `[e for i, e in enumerate(y) for j in range(x[i])]` — user3483203, Jun 29 '18 at 16:21
What sort of inputs are you talking about? If the numbers in `freq` are large then perhaps using `extend` in a single loop might be better than `append` — John Coleman, Jun 29 '18 at 16:21
I don't agree with the closing @jonrsharpe. It is not a duplicate of that question. — Bharel, Jun 29 '18 at 16:22
I understood the OP to be asking how to write a list comprehension equivalent of nested for loops, which the duplicate does cover. If not, could they please [edit] to clarify. — jonrsharpe, Jun 29 '18 at 16:24
@jonrsharpe he is not asking that. He is asking how to make the `resample` function which repeats a char based on a list of numbers. My implementation has no nested loops — nosklo, Jun 29 '18 at 16:25
In case you are trying to frequency weight data, note that `numpy` and `pandas` are able to deal with weights directly, e.g. to take an average https://docs.scipy.org/doc/numpy/reference/generated/numpy.average.html — Stuart, Jun 29 '18 at 16:28
@Stuart, `np.average` doesn't work with flexible types like this — user3483203, Jun 29 '18 at 16:46

Bharel · Answer 1 · 2018-06-29T16:50:19.373

I highly suggest using generators like so:

from itertools import repeat, chain
def resample(freq, data):
    return chain.from_iterable(map(repeat, data, freq))

This will probably be the fastest method there is - map(), repeat() and chain.from_iterable() are all implemented in C so you technically can't get any better.

As for a small explanation:

repeat(i, n) returns an iterator that repeats an item i, n times.

map(repeat, data, freq) returns an iterator that calls repeat every time on an element of data and an element of freq. Basically an iterator that returns repeat() iterators.

chain.from_iterable() flattens the iterator of iterators to return the end items.

No list is created on the way, so there is no overhead and as an added benefit - you can use any type of data and not just one char strings.

While I don't suggest it, you are able to convert it into a list() like so:

result = list(resample([1,2,3], ['a','b','c']))

Some quick testing appears to confirm that this is the fastest answer so far. — anonymoose, Jun 29 '18 at 16:27

score 2 · Answer 2 · answered Jun 29 '18 at 16:21

2

import itertools
def resample(freq, data):
    return itertools.chain.from_iterable([el]*n for el, n in zip(data, freq))

Besides faster, this also has the advantage of being lazy, it returns a generator and the elements are generated step by step

answered Jun 29 '18 at 16:21

nosklo

217,122
57
293
297

user3483203 · Answer 3 · 2018-06-29T16:29:33.053

2

No need to create lists at all, just use a nested loop:

[e for i, e in enumerate(data) for j in range(freq[i])]

# ['a', 'b', 'b', 'c', 'c', 'c']

You can just as easily make this lazy by removing the brackets:

(e for i, e in enumerate(data) for j in range(freq[i]))

edited Jun 29 '18 at 16:29

answered Jun 29 '18 at 16:24

user3483203

50,081
9
65
94

Faster Python List Comprehension

3 Answers3