Count consecutive values within an array with multiple values numpy/pandas

Question

I checked this question and others on SO but the trick is always summing True or False values.

My case is the following array :

arr = [1,2,3,3,4,5,6,1,1,1,5,5,8,8,8,9,4,4,4]

I want to get for each member of the array the length of the "current" streak of repeated value.

For the example above I would like to get :

res = [1,1,1,2,1,1,1,1,2,3,1,2,1,2,3,1,1,2,3]

I could write a dumb loop but is there a clever or already built-in way to do this in numpy/pandas ?

a very minor adaptation is needed for the solution you linked to work for you case... — Adam.Er8, Nov 12 '19 at 07:56
@Chapo Think you need to edit the title to reflect that you want to create a *ranged-array* instead, not just get the counts. — Divakar, Nov 12 '19 at 08:07

Divakar · Answer 1 · 2019-11-12T08:17:38.813

1

A pandas way for input array arr would be -

In [55]: I = np.r_[False,arr[:-1]!=arr[1:]].cumsum()

In [56]: df = pd.DataFrame({'ids':I,'val':np.ones(len(arr),dtype=int)})

In [57]: df.groupby('ids')[['val']].cumsum().values.ravel()
Out[57]: array([1, 1, 1, 2, 1, 1, 1, 1, 2, 3, 1, 2, 1, 2, 3, 1, 1, 2, 3])

Another with a custom NumPy implementation to create ranges based on interval lengths/sizes - intervaled_ranges -

In [91]: m = np.r_[True,arr[:-1]!=arr[1:],True]

In [92]: intervaled_ranges(np.diff(np.flatnonzero(m)),start=1)
Out[92]: array([1, 1, 1, 2, 1, 1, 1, 1, 2, 3, 1, 2, 1, 2, 3, 1, 1, 2, 3])

edited Nov 12 '19 at 08:17

answered Nov 12 '19 at 08:07

Divakar

218,885
19
262
358

thks for your help - went with the one-liner on this one – Chapo Nov 13 '19 at 00:40
@Divakar, it will be helpful, if you can also show how the solution can be extended in case of a Dataframe with multiple columns rather than one columnar pd.Series. I am not able to figure how the 'groupby' would work in that case? – Siraj S. Nov 15 '19 at 13:21
one method (which is still iterative) is "pd.concat([s.groupby(pd.Grouper(i)).cumcount() for i in s.columns], axis=1, sort=False)", where "s = (s!=s.shift()).cumsum()" from @Henry Yik one liner above – Siraj S. Nov 15 '19 at 13:45

score 1 · Accepted Answer · answered Nov 12 '19 at 08:52

1

You can also use pd.Series and groupby:

s = pd.Series([1,2,3,3,4,5,6,1,1,1,5,5,8,8,8,9,4,4,4])

print (s.groupby((s!=s.shift()).cumsum()).cumcount() + 1)
#
[1, 1, 1, 2, 1, 1, 1, 1, 2, 3, 1, 2, 1, 2, 3, 1, 1, 2, 3]

answered Nov 12 '19 at 08:52

Henry Yik

22,275
4
18
40

Count consecutive values within an array with multiple values numpy/pandas

2 Answers2