1

I have a data fame like that :

Timestamp Value
2021-04-21 14:22:00 0
2021-04-21 14:23:00 0
2021-04-21 14:24:00 0
2021-04-21 14:25:00 1
2021-04-21 14:26:00 1
2021-04-21 14:27:00 0
2021-04-21 14:28:00 1
2021-04-21 14:29:00 1
2021-04-21 14:30:00 1

I wan to get the consecutive same values like that :

Begin Date Another header Consecutive values
2021-04-21 14:25:00 2021-04-21 14:26:00 2
2021-04-21 14:28:00 2021-04-21 14:30:00 3

I tried some solutions with numpy shift, or by using for / while loop, but it don't work ... Also, I saw some topics in stack overflow indeed.

Thanks !

Alan CUZON
  • 13
  • 3
  • 1
    Does this answer your question? [How to use pandas to find consecutive same data in time series](https://stackoverflow.com/questions/26911851/how-to-use-pandas-to-find-consecutive-same-data-in-time-series) – Bushmaster Oct 03 '22 at 09:53

2 Answers2

0

You can use a custom groupby.agg:

# identify rows with value=1
m = df['Value'].eq(1)

# filter, groupby consecutive values
out = (df[m].groupby((~m).cumsum())
             # aggregate with first/min, last/max date, and count
            .agg(**{'Begin Date': ('Timestamp', 'first'),    # or min
                    'Another header': ('Timestamp', 'last'), # or max
                    'Consecutive values': ('Timestamp', 'count')})
      )

output:

                Begin Date       Another header  Consecutive values
Value                                                              
3      2021-04-21 14:25:00  2021-04-21 14:26:00                   2
4      2021-04-21 14:28:00  2021-04-21 14:30:00                   3
mozway
  • 194,879
  • 13
  • 39
  • 75
0

You can first group the values by using df.diff(1) and .cumsum():

df['group'] = (df.Value.diff(1) != 0).cumsum()

Now we can use pd.groupby() on this column to create your new dataframe:

out = pd.DataFrame({'Begin Date' : df.groupby('group').Timestamp.first(), 
                    'Another header' : df.groupby('group').Timestamp.last(),
                    'Consecutive values' : df.groupby('group').size(),
                    'Value' : df.groupby('group').Value.first()}).reset_index(drop=True)

Output:

             Begin Date        Another header  Consecutive values  Value
0  2021-04-21 14:22:00   2021-04-21 14:24:00                    3      0
1  2021-04-21 14:25:00   2021-04-21 14:26:00                    2      1
2  2021-04-21 14:27:00   2021-04-21 14:27:00                    1      0
3  2021-04-21 14:28:00   2021-04-21 14:30:00                    3      1

I deliberately show here the complete result (including consecutive zeroes). If you wish to delete those you can filter by doing out[out.Value != 0]

T C Molenaar
  • 3,205
  • 1
  • 10
  • 26