How to identify consecutive same values pandas

Question

I have a data fame like that :

Timestamp	Value
2021-04-21 14:22:00	0
2021-04-21 14:23:00	0
2021-04-21 14:24:00	0
2021-04-21 14:25:00	1
2021-04-21 14:26:00	1
2021-04-21 14:27:00	0
2021-04-21 14:28:00	1
2021-04-21 14:29:00	1
2021-04-21 14:30:00	1

I wan to get the consecutive same values like that :

Begin Date	Another header	Consecutive values
2021-04-21 14:25:00	2021-04-21 14:26:00	2
2021-04-21 14:28:00	2021-04-21 14:30:00	3

I tried some solutions with numpy shift, or by using for / while loop, but it don't work ... Also, I saw some topics in stack overflow indeed.

Thanks !

Does this answer your question? [How to use pandas to find consecutive same data in time series](https://stackoverflow.com/questions/26911851/how-to-use-pandas-to-find-consecutive-same-data-in-time-series) — Bushmaster, Oct 03 '22 at 09:53

score 0 · Answer 1 · answered Oct 03 '22 at 09:52

0

You can use a custom groupby.agg:

# identify rows with value=1
m = df['Value'].eq(1)

# filter, groupby consecutive values
out = (df[m].groupby((~m).cumsum())
             # aggregate with first/min, last/max date, and count
            .agg(**{'Begin Date': ('Timestamp', 'first'),    # or min
                    'Another header': ('Timestamp', 'last'), # or max
                    'Consecutive values': ('Timestamp', 'count')})
      )

output:

                Begin Date       Another header  Consecutive values
Value                                                              
3      2021-04-21 14:25:00  2021-04-21 14:26:00                   2
4      2021-04-21 14:28:00  2021-04-21 14:30:00                   3

answered Oct 03 '22 at 09:52

mozway

194,879
13
39
75

I think it is better to drop the index here (or add them as a column and make sure they have the right value) – T C Molenaar Oct 03 '22 at 10:03
1

@TCMolenaar you can prevent using the grouper as index in groupby or `reset_index(drop=True)` if needed ;) – mozway Oct 03 '22 at 10:09
I know, but don't you think it's better to add that? It's a bit confusing now as there is no value 3 or 4 – T C Molenaar Oct 03 '22 at 12:17

score 0 · Accepted Answer · answered Oct 03 '22 at 10:01

You can first group the values by using df.diff(1) and .cumsum():

df['group'] = (df.Value.diff(1) != 0).cumsum()

Now we can use pd.groupby() on this column to create your new dataframe:

out = pd.DataFrame({'Begin Date' : df.groupby('group').Timestamp.first(), 
                    'Another header' : df.groupby('group').Timestamp.last(),
                    'Consecutive values' : df.groupby('group').size(),
                    'Value' : df.groupby('group').Value.first()}).reset_index(drop=True)

Output:

             Begin Date        Another header  Consecutive values  Value
0  2021-04-21 14:22:00   2021-04-21 14:24:00                    3      0
1  2021-04-21 14:25:00   2021-04-21 14:26:00                    2      1
2  2021-04-21 14:27:00   2021-04-21 14:27:00                    1      0
3  2021-04-21 14:28:00   2021-04-21 14:30:00                    3      1

I deliberately show here the complete result (including consecutive zeroes). If you wish to delete those you can filter by doing out[out.Value != 0]

How to identify consecutive same values pandas

2 Answers2