How to calculate the time difference (date and time are in the same column) and put this difference in a separate column?

Question

Faced a problem. There is a Dataframe in which I need to calculate how much time has passed between operations for each user and indicate this difference in a separate column in this table. It turns out to calculate the time separately using DateTime, but how to do it in the table?

table = pd.DataFrame({
    'user': ['Steve', 'Steve', 'Steve', 'Jack', 'Jack', 'Jack'],
    'country':['UK', 'UK', 'UK', 'CH', 'CH', 'CH'],
    'date': ['2018-01-15 00:05:07', '2018-01-15 00:06:14', '2018-01-15 00:08:36',
             '2018-01-15 00:14:51', '2018-01-15 00:15:18', '2018-01-15 00:17:24']
})
table.set_index('country', inplace=True)

for i in table.groupby(['country', 'user']):
    print(i)

In a separate column, you should get:

For Jack

00:14:51
00:00:27
00:02:06

For Steve

00:05:07
00:01:07
00:02:22

jezrael · Answer 1 · 2021-09-16T06:27:16.703

Use DataFrameGroupBy.diff for difference per groups - ouput are timedeltas:

table['date'] = pd.to_datetime(table['date'])

s = pd.to_timedelta(table['date'].dt.strftime('%H:%M:%S'))
table['new'] = table.groupby(['country', 'user'])['date'].diff().fillna(s)

print (table)
          user                date             new
country                                           
UK       Steve 2018-01-15 00:05:07 0 days 00:05:07
UK       Steve 2018-01-15 00:06:14 0 days 00:01:07
UK       Steve 2018-01-15 00:08:36 0 days 00:02:22
CH        Jack 2018-01-15 00:14:51 0 days 00:14:51
CH        Jack 2018-01-15 00:15:18 0 days 00:00:27
CH        Jack 2018-01-15 00:17:24 0 days 00:02:06

If need convert timedeltas to strings:

def format_timedelta(x):
    ts = x.total_seconds()
    hours, remainder = divmod(ts, 3600)
    minutes, seconds = divmod(remainder, 60)
    return ('{:02d}:{:02d}:{:02d}').format(int(hours), int(minutes), int(seconds)) 

table['new'] = table['new'].apply(format_timedelta)

print (table)
          user                date       new
country                                     
UK       Steve 2018-01-15 00:05:07  00:05:07
UK       Steve 2018-01-15 00:06:14  00:01:07
UK       Steve 2018-01-15 00:08:36  00:02:22
CH        Jack 2018-01-15 00:14:51  00:14:51
CH        Jack 2018-01-15 00:15:18  00:00:27
CH        Jack 2018-01-15 00:17:24  00:02:06

I think better is 0 for first value per group:

table['date'] = pd.to_datetime(table['date'])

table['new'] = table.groupby(['country', 'user'])['date'].diff().fillna(pd.Timedelta(0))

print (table)
          user                date             new
country                                           
UK       Steve 2018-01-15 00:05:07 0 days 00:00:00
UK       Steve 2018-01-15 00:06:14 0 days 00:01:07
UK       Steve 2018-01-15 00:08:36 0 days 00:02:22
CH        Jack 2018-01-15 00:14:51 0 days 00:00:00
CH        Jack 2018-01-15 00:15:18 0 days 00:00:27
CH        Jack 2018-01-15 00:17:24 0 days 00:02:06

I posted part of the dataframe here. But when I use your solution in Jupiter's notebook, on a full dataframe, it doesn't load. Only shows "*" — Rinat Devyatyarov, Sep 16 '21 at 08:06
@RinatDevyatyarov - Can you check [this](https://stackoverflow.com/questions/49188960/how-to-show-all-columns-names-on-a-large-pandas-dataframe) ? — jezrael, Sep 16 '21 at 08:23

How to calculate the time difference (date and time are in the same column) and put this difference in a separate column?

1 Answers1