7

I have a pandas dataframe as follows

Dev_id     Time
88345      13:40:31
87556      13:20:33
88955      13:05:00
.....      ........
85678      12:15:28

The above dataframe has 83000 rows. I want to take time difference between two consecutive rows and keep it in a separate column. The desired result would be

Dev_id    Time          Time_diff(in min)
88345      13:40:31      20
87556      13:20:33      15
88955      13:05:00      15

I have tried df['Time_diff'] = df['Time'].diff(-1) but getting error as shown below

TypeError: unsupported operand type(s) for -: 'datetime.time' and 'datetime.time'

How to solve this

pythondumb
  • 1,187
  • 1
  • 15
  • 30
  • 1
    [Calculate Pandas DataFrame Time Difference Between Two Columns in Hours and Minutes](https://stackoverflow.com/q/22923775/7758804) is an associated question. – Trenton McKinney Jan 31 '21 at 23:06

2 Answers2

14

Problem is pandas need datetimes or timedeltas for diff function, so first converting by to_timedelta, then get total_seconds and divide by 60:

df['Time_diff'] = pd.to_timedelta(df['Time'].astype(str)).diff(-1).dt.total_seconds().div(60)
#alternative
#df['Time_diff'] = pd.to_datetime(df['Time'].astype(str)).diff(-1).dt.total_seconds().div(60)
print (df)
   Dev_id      Time  Time_diff
0   88345  13:40:31  19.966667
1   87556  13:20:33  15.550000
2   88955  13:05:00  49.533333
3   85678  12:15:28        NaN

If want floor or round per minutes:

df['Time_diff'] = (pd.to_timedelta(df['Time'].astype(str))
                     .diff(-1)
                     .dt.floor('T')
                     .dt.total_seconds()
                     .div(60))
print (df)
   Dev_id      Time  Time_diff
0   88345  13:40:31       19.0
1   87556  13:20:33       15.0
2   88955  13:05:00       49.0
3   85678  12:15:28        NaN
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • can't it be `df['Time_diff'] = (pd.to_timedelta(df['Time'])).diff(-1).dt.floor('T').dt.total_seconds().div(60)` , i believe we don't require `astype(str) , or First convert `df['Time'] = pd.to_timedelta(df['Time'])` then `df['Time_Diff'] = df['Time'].diff(-1).dt.total_seconds().div(60)` Just a thought. – Karn Kumar Jan 03 '19 at 11:20
  • 1
    @pygo - problem is with error `TypeError: unsupported operand type(s) for -: 'datetime.time' and 'datetime.time'` - it means there are times, no strings, so necessary convering. For testing `df['Time'] = pd.to_datetime(df['Time']).dt.time` – jezrael Jan 03 '19 at 11:22
0

You should first convert / cast df['Time'] column to pd.Timedelta and then do the substraction

Petros
  • 342
  • 3
  • 15