How to use pandas to find consecutive same data in time series

Question

Here is a time series data like this,call it df:

      'No'       'Date'       'Value'
0     600000     1999-11-10    1
1     600000     1999-11-11    1
2     600000     1999-11-12    1
3     600000     1999-11-15    1
4     600000     1999-11-16    1
5     600000     1999-11-17    1
6     600000     1999-11-18    0
7     600000     1999-11-19    1
8     600000     1999-11-22    1
9     600000     1999-11-23    1
10    600000     1999-11-24    1
11    600000     1999-11-25    0
12    600001     1999-11-26    1
13    600001     1999-11-29    1
14    600001     1999-11-30    0

I want to get the date range of the consecutive 'Value' of 1, so how can I get the final result as follows:

   'No'     'BeginDate'    'EndDate'   'Consecutive'
0 600000    1999-11-10    1999-11-17    6
1 600000    1999-11-19    1999-11-24    4
2 600001    1999-11-26    1999-11-29    2

here are the basic tools, the rest you can figure out on your own: use `groupby` on the `No` column and then, on each group, do `df.Value - df.Value.shift(1)` and see when they are not equal to zero. — acushner, Nov 17 '14 at 17:13
Related question: https://stackoverflow.com/questions/45886518/identify-consecutive-same-values-in-pandas-dataframe-with-a-groupby — Anton Tarasenko, Aug 14 '19 at 16:03
Related question: https://stackoverflow.com/questions/40802800/pandas-dataframe-how-to-groupby-consecutive-values — Anton Tarasenko, Aug 14 '19 at 16:04
[Run-length encoding](https://en.wikipedia.org/wiki/Run-length_encoding) maybe — Xpector, May 20 '20 at 11:00

user1827356 · Accepted Answer · 2014-11-20T17:28:53.633

46

This should do it

df['value_grp'] = (df.Values.diff(1) != 0).astype('int').cumsum()

value_grp will increment by one whenever Value changes. Below, you can extract the group results

pd.DataFrame({'BeginDate' : df.groupby('value_grp').Date.first(), 
              'EndDate' : df.groupby('value_grp').Date.last(),
              'Consecutive' : df.groupby('value_grp').size(), 
              'No' : df.groupby('value_grp').No.first()}).reset_index(drop=True)

edited Nov 20 '14 at 17:28

answered Nov 13 '14 at 16:47

user1827356

6,764
2
21
30

Hi user1827356, thans for your quickly answer, but the result is not the same as I want, you can see the result I list below your answer. – figo Nov 14 '14 at 01:05
@figo, my bad. There was a typo in value_grp calculation. Can you recheck? You can filter on Consecutive > 1 for your exact answer – user1827356 Nov 20 '14 at 17:24
Note that if `df.Values` is not numeric, you can still do `(df.Values != df.Values.shift()).cumsum()` (no `.astype(int)` needed) – BallpointBen Feb 19 '19 at 13:37
It's worth noting that you don't actually need the "astype(int)" there--pandas is perfectly happen to sum Boolean values. – MTKnife May 05 '21 at 00:21

score 5 · Answer 2 · answered Jun 27 '16 at 22:42

Here is an alternative solution:

rslt = (df.assign(Consecutive=df.Value
                                .groupby((df.Value != df.Value.shift())
                                         .cumsum())
                                .transform('size'))
          .query('Consecutive > 1')
          .groupby('Consecutive')
          .agg({'No':{'No':'first'}, 'Date': {'BeginDate':'first', 'EndDate':'last'}})
          .reset_index()
)
rslt.columns = [t[1] if t[1] else t[0] for t in rslt.columns]

Demo:

In [225]: %paste
rslt = (df.assign(Consecutive=df.Value
                                .groupby((df.Value != df.Value.shift())
                                         .cumsum())
                                .transform('size'))
          .query('Consecutive > 1')
          .groupby('Consecutive')
          .agg({'No':{'No':'first'}, 'Date': {'BeginDate':'first', 'EndDate':'last'}})
          .reset_index()
)
rslt.columns = [t[1] if t[1] else t[0] for t in rslt.columns]
## -- End pasted text --

In [226]: rslt
Out[226]:
   Consecutive  BeginDate    EndDate      No
0            2 1999-11-26 1999-11-29  600001
1            4 1999-11-19 1999-11-24  600000
2            6 1999-11-10 1999-11-17  600000

How to use pandas to find consecutive same data in time series

2 Answers2

Linked