-3

I checked this question and others on SO but the trick is always summing True or False values.

My case is the following array :

arr = [1,2,3,3,4,5,6,1,1,1,5,5,8,8,8,9,4,4,4]

I want to get for each member of the array the length of the "current" streak of repeated value.

For the example above I would like to get :

res = [1,1,1,2,1,1,1,1,2,3,1,2,1,2,3,1,1,2,3]

I could write a dumb loop but is there a clever or already built-in way to do this in numpy/pandas ?

Community
  • 1
  • 1
Chapo
  • 2,563
  • 3
  • 30
  • 60
  • a very minor adaptation is needed for the solution you linked to work for you case... – Adam.Er8 Nov 12 '19 at 07:56
  • @Chapo Think you need to edit the title to reflect that you want to create a *ranged-array* instead, not just get the counts. – Divakar Nov 12 '19 at 08:07

2 Answers2

1

A pandas way for input array arr would be -

In [55]: I = np.r_[False,arr[:-1]!=arr[1:]].cumsum()

In [56]: df = pd.DataFrame({'ids':I,'val':np.ones(len(arr),dtype=int)})

In [57]: df.groupby('ids')[['val']].cumsum().values.ravel()
Out[57]: array([1, 1, 1, 2, 1, 1, 1, 1, 2, 3, 1, 2, 1, 2, 3, 1, 1, 2, 3])

Another with a custom NumPy implementation to create ranges based on interval lengths/sizes - intervaled_ranges -

In [91]: m = np.r_[True,arr[:-1]!=arr[1:],True]

In [92]: intervaled_ranges(np.diff(np.flatnonzero(m)),start=1)
Out[92]: array([1, 1, 1, 2, 1, 1, 1, 1, 2, 3, 1, 2, 1, 2, 3, 1, 1, 2, 3])
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • thks for your help - went with the one-liner on this one – Chapo Nov 13 '19 at 00:40
  • @Divakar, it will be helpful, if you can also show how the solution can be extended in case of a Dataframe with multiple columns rather than one columnar pd.Series. I am not able to figure how the 'groupby' would work in that case? – Siraj S. Nov 15 '19 at 13:21
  • one method (which is still iterative) is "pd.concat([s.groupby(pd.Grouper(i)).cumcount() for i in s.columns], axis=1, sort=False)", where "s = (s!=s.shift()).cumsum()" from @Henry Yik one liner above – Siraj S. Nov 15 '19 at 13:45
1

You can also use pd.Series and groupby:

s = pd.Series([1,2,3,3,4,5,6,1,1,1,5,5,8,8,8,9,4,4,4])

print (s.groupby((s!=s.shift()).cumsum()).cumcount() + 1)
#
[1, 1, 1, 2, 1, 1, 1, 1, 2, 3, 1, 2, 1, 2, 3, 1, 1, 2, 3]
Henry Yik
  • 22,275
  • 4
  • 18
  • 40