Counting identical elements running in a row

Question

The task is: there is a list of two different values, here 0 and 1(in a real problem, string values). You need to make a list of the number of zeros going in a row (something like a "combo" of zeros). So if data is: [0, 1, 1, 0, 0, 0, 1, 0, 0,] resul must be: [1, 3, 2] (the number of "combos" of zeros) My code using a loop:

data = [0, 1, 1, 0, 0, 0, 1, 0, 0,]

zeros_combo = []
count_zeros = 0
for value in data:
    if value == 0:
        count_zeros += 1
    else:
        # reset the counter, here value == 1
        if count_zeros > 0:
            zeros_combo.append(count_zeros)
            count_zeros = 0

# Adding the last values
if count_zeros > 0:
    zeros_combo.append(count_zeros)
print(zeros_combo) # result: [1, 3, 2]

Is there any way to remake this code without using a loop to speed up? Maybe, with pandas, numpy, or other tools.. Vectorized functions.. Cause with the iteration of elements in the loop and comparison, it turns out slowly.

I tried different ways, almost all turned out to be slower.

res = [v for v in map(len, ''.join(map(str, data)).split('1')) if v != 0]

res = list(filter(lambda v: v != 0, map(len, ''.join(map(str, data)).split('1'))))

import re
res = list(map(len, re.findall(r'0+', ''.join(map(str, data)))))

from itertools import groupby
res = [len(list(g)) for k, g in groupby(data, lambda x: x==0) if k]

Super slow:

import pandas as pd
s = pd.Series(data)
res = s[s==0].groupby(s.diff().ne(0).cumsum()).agg(len).values

Do you want to improve in speed, or you want to improve on time complexity ? — Deepak Tripathi, Nov 05 '22 at 19:22
@Psidom my way **itertools.groupby** is slower than **for-loop** — Misha, Nov 05 '22 at 19:30
Does this answer your question? [Count consecutive occurences of values varying in length in a numpy array](https://stackoverflow.com/questions/24342047/count-consecutive-occurences-of-values-varying-in-length-in-a-numpy-array) - especially https://stackoverflow.com/a/24343375/463796 with `condition = ~np.array(data)`. It's around 9x faster on my machine for 1M elements, given data is already a numpy array. — w-m, Nov 07 '22 at 12:06
@w-m His task is almost the same. The solution turned out to be faster. Thanks! How did you find this question? on the website? — Misha, Nov 07 '22 at 16:26
Glad it helped! I googled something like "count consecutive elements numpy" :) — w-m, Nov 07 '22 at 18:11

score 0 · Answer 1 · answered Nov 05 '22 at 19:37

Your function is pretty good as it runs in O(n) time complexity. That means your code scales linearly - the more items in a list, the longer it will take.

So from an algorithmic standpoint, I don't think there is anything to improve. Regarding the actual time it takes in python, I can't think of any built-in functions that could make this faster.

But I know a library that could help. It's called Numba, and it is a compiler for python. It isn't that popular because you can only use built-in functions (so forget about any libraries). But if you are trying to speed up this code alone it should work fine.

Another option would be to translate it to a different language (preferably a compiled one like C, C++, Java, Go, etc.). Again, this is useful only if you need to rewrite this code alone, not an entire project.

what i should do with Numba? i used jit decorator and rewrote for-loop to **"prange"** `@jit(fastmath=True, forceobj=True, parallel=True)`. it started to work slower — Misha, Nov 05 '22 at 20:12
@Misha Well because Numba is compiled, it always runs slower the first time you run it and then runs fast the second, third, fourth, etc. time. — Vladislav Korecký, Nov 06 '22 at 07:23

Counting identical elements running in a row

1 Answers1