3

I am trying to calculate positive and negative and no streaks using numpy exclusively. The issue i'm having to figuring out the groupby component of the equation which all my research has lead to believe I need. I found a pandas response here Pythonic way to calculate streaks in pandas dataframe

I've been able to convert all but the groupby piece. Any help is appreciated

here is the pandas code i would like to reproduce. The only non numpy equivalent is groupby. I also created my own shift function in numpy.

Pandas version:

def streaks(df, col):
    sign = np.sign(df[col])
    s = sign.groupby((sign!=sign.shift()).cumsum()).cumsum()
    return df.assign(u_streak=s.where(s>0, 0.0), 
    d_streak=s.where(s<0,0.0).abs())

My partial numpy version:

arr = np.array([0.2,0.1,0.1,0.0,-0.2,-0.1,0.0])
sign = np.sign(arr)
s = np.not_equal(sign, shift(sign))
# now I need to groupby and then sum and sum again 
np.cumsum(groupby(np.cumsum(s)))

The expected result should be:

array([1.,2.,3.,0.,-1.,-2.,0.])
John Holmes
  • 381
  • 1
  • 14
  • Possible duplicate of [Is there any numpy group by function?](https://stackoverflow.com/questions/38013778/is-there-any-numpy-group-by-function) – bjschoenfeld Aug 15 '19 at 22:24

1 Answers1

2

for a full version with numpy, you don't have to use a kind of a groupby, you can do:

arr = np.array([0.2,0.1,0.1,0.0,-0.2,-0.1,0.0])
sign = np.sign(arr)
s = np.abs(sign).cumsum() # or s = (arr != 0).cumsum()
streaks = (s - np.maximum.accumulate(np.where(arr == 0, s, 0)))*sign
print (streaks)
#[ 1.  2.  3.  0. -1. -2.  0.] 

What it does, is that s increases every time the value in arr is not 0, then you will remove from it the cumulative maximum of the position where it is actually 0 to "restart" at 1 the counting of the next streak, that you just need to multiply by the sign to get your expected output.

EDIT: the above method assume that there is a 0 between positive and negative streaks, to not assume this, you can do it by splitting the positive and negative cases:

arr = np.array([1.2,-1.2,0.2,0.1,0.1,0.0,-0.2,-0.1,0.0])
pos = np.clip(arr, 0, 1).astype(bool).cumsum()
neg = np.clip(arr, -1, 0).astype(bool).cumsum()
streaks = np.where(arr >= 0, pos-np.maximum.accumulate(np.where(arr <= 0, pos, 0)),
                             -neg+np.maximum.accumulate(np.where(arr >= 0, neg, 0)))
print (streaks)
#[ 1 -1  1  2  3  0 -1 -2  0]
Ben.T
  • 29,160
  • 6
  • 32
  • 54
  • worked for the sample array i provided but when i change the values it doesn't work. adding just two new elements arr = np.array([1.2,-1.2,0.2,0.1,0.1,0.0,-0.2,-0.1,0.0]) gives a result of of [ 1. -2. 3. 4. 5. 0. -1. -2. 0.] instead of [1.,-1., 1. 2. 3. 0. -1. -2. 0..] – John Holmes Aug 16 '19 at 01:48
  • @JohnHolmes indeed, I made the (wrong) assumption that there was a 0 between streaks – Ben.T Aug 16 '19 at 01:57
  • 1
    @BenT thank you sir much appreciated. I ran a timeit on the dataframe vs the numpy and your solution is 2x faster which is a bonus. – John Holmes Aug 16 '19 at 02:47