How to do time diff in each group on Pandas in Python

Question

Here's the phony data:

df = pd.DataFrame({'email': ['u1','u1','u1','u2','u2','u2'],
              'timestamp': [3, 1, 5, 11, 15, 9]})

What I intend to retrieve is the time diff in each group of email. Thus, after sorting by timestamp in each group, the data should be:

the result should be:

u1  2  # 5-3
u1  2  # 3-1
u2  4  # 15-11
u2  2  # 11-9

Could anyone tell me what I should do next? Great thanks.

ayhan · Accepted Answer · 2016-07-24T12:21:08.690

3

df = pd.DataFrame({'email': ['u1','u1','u1','u2','u2','u2'],
                   'timestamp': [3, 1, 5, 11, 15, 9]})

(df.sort_values(['email', 'timestamp'], ascending=[True, False])
 .groupby('email')['timestamp']
 .diff(-1)
 .dropna())
Out: 
2    2.0
0    2.0
4    4.0
3    2.0
Name: timestamp, dtype: float64

To keep the email column:

df.sort_values(['email', 'timestamp'], ascending=[True, False], inplace=True)
df.assign(diff=df.groupby('email')['timestamp'].diff(-1)).dropna()
Out: 
  email  timestamp  diff
2    u1          5   2.0
0    u1          3   2.0
4    u2         15   4.0
3    u2         11   2.0

If you don't want the timestamp column you can add .drop('timestamp', axis=1) at the end.

edited Jul 24 '16 at 12:21

answered Jul 24 '16 at 12:05

ayhan

70,170
20
182
203

@MaxU The index also makes more sense with that. Thank you. – ayhan Jul 24 '16 at 12:11
Could we switch the index (2, 0, 4, 3) to the previous corresponding email column? Thanks – Judking Jul 24 '16 at 12:16
Hey, one more question, If I remove the `inplace=True` and concat the two prompts together, the result is incorrect, why is that? – Judking Jul 24 '16 at 12:50
Because in that case the grouped values are for the unsorted (original) dataframe. – ayhan Jul 24 '16 at 13:09

How to do time diff in each group on Pandas in Python

1 Answers1

Linked