1

Given a dataset like this:

values = ([ 'motorway' ] * 5) + ([ 'link' ] * 3) + ([ 'motorway' ] * 7)

df = pd.DataFrame.from_dict({
  'timestamp': pd.date_range(start='2018-1-1', end='2018-1-2', freq='s').tolist()[:len(values)],
  'road_type': values,
})
df.set_index('timestamp')
df['delta_t'] = (df['timestamp'] - df['timestamp'].shift()).fillna(0)

I want to have the max sums of delta_t per group of consecutive road_types; given that delta_t is going to be 1s in this sample case, I want to find motorway: 7s and link: 3s. In practice there will be more road_types, and delta_t will vary.

edit: the solution provided here seems similar but does not sum the times nor does it select the largest of each groups.

retorquere
  • 1,496
  • 1
  • 14
  • 27
  • Possible duplicate of [Pandas DataFrame: How to groupby consecutive values](https://stackoverflow.com/questions/40802800/pandas-dataframe-how-to-groupby-consecutive-values) – Niklas Mertsch Oct 20 '18 at 15:49

1 Answers1

1

Create a new column that labels each "run" of identical road types with a unique integer, then group by that column and sum:

df['run'] = (df['road_type'] != df['road_type'].shift()).astype(int).cumsum()

df
             timestamp road_type  delta_t  run
0  2018-01-01 00:00:00  motorway 00:00:00    1
1  2018-01-01 00:00:01  motorway 00:00:01    1
2  2018-01-01 00:00:02  motorway 00:00:01    1
3  2018-01-01 00:00:03  motorway 00:00:01    1
4  2018-01-01 00:00:04  motorway 00:00:01    1
5  2018-01-01 00:00:05      link 00:00:01    2
6  2018-01-01 00:00:06      link 00:00:01    2
7  2018-01-01 00:00:07      link 00:00:01    2
8  2018-01-01 00:00:08  motorway 00:00:01    3
9  2018-01-01 00:00:09  motorway 00:00:01    3
10 2018-01-01 00:00:10  motorway 00:00:01    3
11 2018-01-01 00:00:11  motorway 00:00:01    3
12 2018-01-01 00:00:12  motorway 00:00:01    3
13 2018-01-01 00:00:13  motorway 00:00:01    3
14 2018-01-01 00:00:14  motorway 00:00:01    3


df.groupby('run').agg({'road_type': 'first', 'delta_t': 'sum'}).reset_index(drop=True).groupby('road_type').max()

           delta_t
road_type         
link      00:00:03
motorway  00:00:07
Peter Leimbigler
  • 10,775
  • 1
  • 23
  • 37