1

I have a Pandas (pandas==0.23.4) datetime-indexed dataframe df with a column named value_id.

value_id contains groups of float values (either 5.0 or 6.0) and groups of NaN. I would like to count the number of continuous groups for both 5.0 and 6.0. The groups must contain at least three consecutive values.

For example:

In [1]: print df.value_id
timestamp
2019-01-06 17:42:08    NaN
2019-01-06 17:45:08    5.0
2019-01-06 17:48:08    5.0
2019-01-06 17:51:08    5.0
2019-01-06 17:54:08    NaN
2019-01-06 17:57:08    NaN
2019-01-06 18:00:08    NaN
2019-01-06 18:03:08    NaN
2019-01-06 18:06:08    NaN
2019-01-06 18:09:08    NaN
2019-01-06 18:12:08    6.0
2019-01-06 18:15:08    6.0
2019-01-06 19:54:09    NaN
2019-01-06 19:57:09    5.0
2019-01-06 20:00:08    5.0
2019-01-06 20:03:08    5.0
2019-01-06 20:06:09    NaN
2019-01-06 20:09:08    NaN
2019-01-06 20:12:08    NaN
2019-01-06 20:15:09    NaN
2019-01-06 20:18:08    NaN
2019-01-06 20:21:09    NaN
2019-01-06 20:24:09    NaN
2019-01-07 19:09:07    NaN
2019-01-07 19:12:06    NaN
2019-01-07 19:15:06    5.0
2019-01-07 19:18:06    5.0
2019-01-07 19:21:07    5.0
2019-01-07 19:24:07    5.0
2019-01-07 19:27:07    NaN
2019-01-07 19:30:07    NaN
2019-01-07 19:33:06    NaN
2019-01-07 19:36:07    NaN
2019-01-07 19:39:07    NaN
2019-01-07 19:42:06    NaN
2019-01-07 19:45:06    NaN
2019-01-07 19:48:06    NaN
2019-01-07 19:51:06    6.0
2019-01-07 19:54:07    6.0
2019-01-07 19:57:06    6.0
Name: value_id, dtype: float64

If I had two variables named count1 (for the 5.0 value groups) and count2 (for the 6.0 value groups), the resulting counts assigned for the above example would be:

count1: 3

count2: 1

pjw
  • 2,133
  • 3
  • 27
  • 44

2 Answers2

1

Perhaps not the most elegant, but you can use shift to both check that the next two items are the same value, and that the previous value is not part of the same group:

df['fives'] = ((df['timestamp'] == 5) & (df['timestamp'].shift(-1) == 5)
                & (df['timestamp'].shift(-2) == 5)
                & (df['timestamp'].shift(1) != 5)).astype(int)
df['sixes'] = ((df['timestamp'] == 6) & (df['timestamp'].shift(-1) == 6)
                & (df['timestamp'].shift(-2) == 6)
                & (df['timestamp'].shift(1) != 6)).astype(int)

df[['fives','sixes']].sum()
fives    3
sixes    1
dtype: int64
Brendan
  • 3,901
  • 15
  • 23
  • I think you intended to have `df['value_id']`. `timestamp` is the name of the datetime index. But yes, this works. – pjw Jul 18 '19 at 22:04
  • @PJW That's how your sample data copied into pandas with `read_clipboard` - a bit of laziness on my part to not rename it. WeNYoBen's solution is definitely more elegant. – Brendan Jul 18 '19 at 23:40
1

IIUC create the group key with cumsum then we just do value_counts

s.groupby(s.isnull().cumsum()).value_counts().ge(3).sum(level=1)
Out[1026]: 
timestamp
5.0    3.0
6.0    1.0
Name: timestamp, dtype: float64
BENY
  • 317,841
  • 20
  • 164
  • 234