3

In python pandas or numpy, is there a built-in function or a combination of functions that can count the number of positive or negative values in a row?

This could be thought of as similar to a roulette wheel with the number of blacks or reds in a row.

Example input series data:

Date
2000-01-07    -3.550049
2000-01-10    28.609863
2000-01-11    -2.189941
2000-01-12     4.419922
2000-01-13    17.690185
2000-01-14    41.219971
2000-01-18     0.000000
2000-01-19   -16.330078
2000-01-20     7.950195
2000-01-21     0.000000
2000-01-24    38.370117
2000-01-25     6.060059
2000-01-26     3.579834
2000-01-27     7.669922
2000-01-28     2.739991
2000-01-31    -8.039795
2000-02-01    10.239990
2000-02-02    -1.580078
2000-02-03     1.669922
2000-02-04     7.440186
2000-02-07    -0.940185

Desired output:

-  in a row 5 times
+  in a row 4 times
++  in a row once
++++  in a row once
+++++++ in a row once
nk abram
  • 1,531
  • 2
  • 11
  • 19

3 Answers3

1

You can use itertools.groupby() function.

import itertools

l = [-3.550049, 28.609863, -2.189941,  4.419922, 17.690185, 41.219971,  0.000000, -16.330078,  7.950195,  0.000000, 38.370117,  6.060059,  3.579834,  7.669922,  2.739991, -8.039795, 10.239990, -1.580078,  1.669922,  7.440186, -0.940185]

r_pos = {}
r_neg = {}
for k, v in itertools.groupby(l, lambda e:e>0):
    count = len(list(v))
    r = r_pos
    if k == False:
        r = r_neg
    if count not in r.keys():
        r[count] = 0
    r[count] += 1

for k, v in r_neg.items():
    print '%s in a row %s time(s)' % ('-'*k, v)

for k, v in r_pos.items():
    print '%s in a row %s time(s)' % ('+'*k, v)

output

- in a row 6 time(s)
+ in a row 2 time(s)
++ in a row 1 time(s)
++++ in a row 1 time(s)
+++++++ in a row 1 time(s)

depending on what you consider as a positive value, you can change the line lambda e:e>0

Ghilas BELHADJ
  • 13,412
  • 10
  • 59
  • 99
  • Thanks for the suggestions. With this and the previous answer combined I was able to put together something that works for what I need. Awesome! – nk abram Aug 08 '16 at 11:17
1

Nonnegatives:

from functools import reduce  # For Python 3.x
ser = df['x'] >= 0
c = ser.expanding().apply(lambda r: reduce(lambda x, y: x + 1 if y else x * y, r))
c[ser & (ser != ser.shift(-1))].value_counts()
Out: 
1.0    2
7.0    1
4.0    1
2.0    1
Name: x, dtype: int64

Negatives:

ser = df['x'] < 0
c = ser.expanding().apply(lambda r: reduce(lambda x, y: x + 1 if y else x * y, r))
c[ser & (ser != ser.shift(-1))].value_counts()

Out: 
1.0    6
Name: x, dtype: int64

Basically, it creates a boolean series takes the cumulative count between the turning points (when the sign changes, it starts over). For example, for nonnegatives, c is:

Out: 
0     0.0
1     1.0  # turning point
2     0.0
3     1.0
4     2.0
5     3.0
6     4.0  # turning point
7     0.0
8     1.0
9     2.0
10    3.0
11    4.0
12    5.0
13    6.0
14    7.0  # turning point
15    0.0
16    1.0  # turning point
17    0.0
18    1.0
19    2.0  # turning point
20    0.0
Name: x, dtype: float64

Now, in order to identify the turning points the condition is that the current value is different than the next and it is True. If you select those, you have the counts.

ayhan
  • 70,170
  • 20
  • 182
  • 203
  • Thanks for the help. I wasn't able to get your suggestion to work, but with combining your answer with Ghilas' answer below, I was able to hash out something that works well for what I need. Thanks again! – nk abram Aug 08 '16 at 11:16
  • this method takes a long time / not very performant... try this solution instead: https://stackoverflow.com/questions/27626542/counting-consecutive-positive-values-in-python-pandas-array – Chris Nov 08 '22 at 18:00
1

So far this is what I've come up with, it works and outputs a count for how many times each of the negative, positive and zero values occur in a row. Maybe someone can make it more concise using some of the suggestions posted by ayhan and Ghilas above.

from collections import Counter

ser = [-3.550049, 28.609863, -2.1, 89941,4.419922,17.690185,41.219971,0.000000,-16.330078,7.950195,0.000000,38.370117,6.060059,3.579834,7.669922,2.739991,-8.039795,10.239990,-1.580078, 1.669922, 7.440186,-0.940185]

c = 0
zeros, neg_counts, pos_counts = [], [], []
for i in range(len(ser)):
    c+=1
    s = np.sign(ser[i])
    try:
        if s != np.sign(ser[i+1]):
            if s == 0:
               zeros.append(c)
            elif s == -1:
                neg_counts.append(c)
            elif s == 1:
                pos_counts.append(c)
            c = 0
    except IndexError:
        pos_counts.append(c) if s == 1 else neg_counts.append(c) if s ==-1 else zeros.append(c)

print(Counter(neg_counts))
print(Counter(pos_counts))
print(Counter(zeros))

Out:

Counter({1: 5})
Counter({1: 3, 2: 1, 4: 1, 5: 1})
Counter({1: 2})
nk abram
  • 1,531
  • 2
  • 11
  • 19