In Pandas, how can I count consecutive positive and negatives in a row?

Question

In python pandas or numpy, is there a built-in function or a combination of functions that can count the number of positive or negative values in a row?

This could be thought of as similar to a roulette wheel with the number of blacks or reds in a row.

Example input series data:

Date
2000-01-07    -3.550049
2000-01-10    28.609863
2000-01-11    -2.189941
2000-01-12     4.419922
2000-01-13    17.690185
2000-01-14    41.219971
2000-01-18     0.000000
2000-01-19   -16.330078
2000-01-20     7.950195
2000-01-21     0.000000
2000-01-24    38.370117
2000-01-25     6.060059
2000-01-26     3.579834
2000-01-27     7.669922
2000-01-28     2.739991
2000-01-31    -8.039795
2000-02-01    10.239990
2000-02-02    -1.580078
2000-02-03     1.669922
2000-02-04     7.440186
2000-02-07    -0.940185

Desired output:

-  in a row 5 times
+  in a row 4 times
++  in a row once
++++  in a row once
+++++++ in a row once

What do you mean by `consecutive positive and negatives in a row`? Can you give us a sample case? — Divakar, Aug 06 '16 at 15:52

Ghilas BELHADJ · Answer 1 · 2016-08-07T00:25:03.457

You can use itertools.groupby() function.

import itertools

l = [-3.550049, 28.609863, -2.189941,  4.419922, 17.690185, 41.219971,  0.000000, -16.330078,  7.950195,  0.000000, 38.370117,  6.060059,  3.579834,  7.669922,  2.739991, -8.039795, 10.239990, -1.580078,  1.669922,  7.440186, -0.940185]

r_pos = {}
r_neg = {}
for k, v in itertools.groupby(l, lambda e:e>0):
    count = len(list(v))
    r = r_pos
    if k == False:
        r = r_neg
    if count not in r.keys():
        r[count] = 0
    r[count] += 1

for k, v in r_neg.items():
    print '%s in a row %s time(s)' % ('-'*k, v)

for k, v in r_pos.items():
    print '%s in a row %s time(s)' % ('+'*k, v)

output

- in a row 6 time(s)
+ in a row 2 time(s)
++ in a row 1 time(s)
++++ in a row 1 time(s)
+++++++ in a row 1 time(s)

depending on what you consider as a positive value, you can change the line lambda e:e>0

Thanks for the suggestions. With this and the previous answer combined I was able to put together something that works for what I need. Awesome! — nk abram, Aug 08 '16 at 11:17

ayhan · Answer 2 · 2016-08-07T00:21:48.027

Nonnegatives:

from functools import reduce  # For Python 3.x
ser = df['x'] >= 0
c = ser.expanding().apply(lambda r: reduce(lambda x, y: x + 1 if y else x * y, r))
c[ser & (ser != ser.shift(-1))].value_counts()
Out: 
1.0    2
7.0    1
4.0    1
2.0    1
Name: x, dtype: int64

Negatives:

ser = df['x'] < 0
c = ser.expanding().apply(lambda r: reduce(lambda x, y: x + 1 if y else x * y, r))
c[ser & (ser != ser.shift(-1))].value_counts()

Out: 
1.0    6
Name: x, dtype: int64

Basically, it creates a boolean series takes the cumulative count between the turning points (when the sign changes, it starts over). For example, for nonnegatives, c is:

Out: 
0     0.0
1     1.0  # turning point
2     0.0
3     1.0
4     2.0
5     3.0
6     4.0  # turning point
7     0.0
8     1.0
9     2.0
10    3.0
11    4.0
12    5.0
13    6.0
14    7.0  # turning point
15    0.0
16    1.0  # turning point
17    0.0
18    1.0
19    2.0  # turning point
20    0.0
Name: x, dtype: float64

Now, in order to identify the turning points the condition is that the current value is different than the next and it is True. If you select those, you have the counts.

Thanks for the help. I wasn't able to get your suggestion to work, but with combining your answer with Ghilas' answer below, I was able to hash out something that works well for what I need. Thanks again! — nk abram, Aug 08 '16 at 11:16
this method takes a long time / not very performant... try this solution instead: https://stackoverflow.com/questions/27626542/counting-consecutive-positive-values-in-python-pandas-array — Chris, Nov 08 '22 at 18:00

nk abram · Answer 3 · 2016-08-08T12:46:29.637

So far this is what I've come up with, it works and outputs a count for how many times each of the negative, positive and zero values occur in a row. Maybe someone can make it more concise using some of the suggestions posted by ayhan and Ghilas above.

from collections import Counter

ser = [-3.550049, 28.609863, -2.1, 89941,4.419922,17.690185,41.219971,0.000000,-16.330078,7.950195,0.000000,38.370117,6.060059,3.579834,7.669922,2.739991,-8.039795,10.239990,-1.580078, 1.669922, 7.440186,-0.940185]

c = 0
zeros, neg_counts, pos_counts = [], [], []
for i in range(len(ser)):
    c+=1
    s = np.sign(ser[i])
    try:
        if s != np.sign(ser[i+1]):
            if s == 0:
               zeros.append(c)
            elif s == -1:
                neg_counts.append(c)
            elif s == 1:
                pos_counts.append(c)
            c = 0
    except IndexError:
        pos_counts.append(c) if s == 1 else neg_counts.append(c) if s ==-1 else zeros.append(c)

print(Counter(neg_counts))
print(Counter(pos_counts))
print(Counter(zeros))

Out:

Counter({1: 5})
Counter({1: 3, 2: 1, 4: 1, 5: 1})
Counter({1: 2})

In Pandas, how can I count consecutive positive and negatives in a row?

3 Answers3