0

Faced a problem. There is a Dataframe in which I need to calculate how much time has passed between operations for each user and indicate this difference in a separate column in this table. It turns out to calculate the time separately using DateTime, but how to do it in the table?

table = pd.DataFrame({
    'user': ['Steve', 'Steve', 'Steve', 'Jack', 'Jack', 'Jack'],
    'country':['UK', 'UK', 'UK', 'CH', 'CH', 'CH'],
    'date': ['2018-01-15 00:05:07', '2018-01-15 00:06:14', '2018-01-15 00:08:36',
             '2018-01-15 00:14:51', '2018-01-15 00:15:18', '2018-01-15 00:17:24']
})
table.set_index('country', inplace=True)

for i in table.groupby(['country', 'user']):
    print(i)

In a separate column, you should get:

For Jack

  1. 00:14:51
  2. 00:00:27
  3. 00:02:06

For Steve

  1. 00:05:07
  2. 00:01:07
  3. 00:02:22

1 Answers1

0

Use DataFrameGroupBy.diff for difference per groups - ouput are timedeltas:

table['date'] = pd.to_datetime(table['date'])

s = pd.to_timedelta(table['date'].dt.strftime('%H:%M:%S'))
table['new'] = table.groupby(['country', 'user'])['date'].diff().fillna(s)

print (table)
          user                date             new
country                                           
UK       Steve 2018-01-15 00:05:07 0 days 00:05:07
UK       Steve 2018-01-15 00:06:14 0 days 00:01:07
UK       Steve 2018-01-15 00:08:36 0 days 00:02:22
CH        Jack 2018-01-15 00:14:51 0 days 00:14:51
CH        Jack 2018-01-15 00:15:18 0 days 00:00:27
CH        Jack 2018-01-15 00:17:24 0 days 00:02:06

If need convert timedeltas to strings:

def format_timedelta(x):
    ts = x.total_seconds()
    hours, remainder = divmod(ts, 3600)
    minutes, seconds = divmod(remainder, 60)
    return ('{:02d}:{:02d}:{:02d}').format(int(hours), int(minutes), int(seconds)) 

table['new'] = table['new'].apply(format_timedelta)

print (table)
          user                date       new
country                                     
UK       Steve 2018-01-15 00:05:07  00:05:07
UK       Steve 2018-01-15 00:06:14  00:01:07
UK       Steve 2018-01-15 00:08:36  00:02:22
CH        Jack 2018-01-15 00:14:51  00:14:51
CH        Jack 2018-01-15 00:15:18  00:00:27
CH        Jack 2018-01-15 00:17:24  00:02:06

I think better is 0 for first value per group:

table['date'] = pd.to_datetime(table['date'])

table['new'] = table.groupby(['country', 'user'])['date'].diff().fillna(pd.Timedelta(0))

print (table)
          user                date             new
country                                           
UK       Steve 2018-01-15 00:05:07 0 days 00:00:00
UK       Steve 2018-01-15 00:06:14 0 days 00:01:07
UK       Steve 2018-01-15 00:08:36 0 days 00:02:22
CH        Jack 2018-01-15 00:14:51 0 days 00:00:00
CH        Jack 2018-01-15 00:15:18 0 days 00:00:27
CH        Jack 2018-01-15 00:17:24 0 days 00:02:06
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • I posted part of the dataframe here. But when I use your solution in Jupiter's notebook, on a full dataframe, it doesn't load. Only shows "*" – Rinat Devyatyarov Sep 16 '21 at 08:06
  • @RinatDevyatyarov - Can you check [this](https://stackoverflow.com/questions/49188960/how-to-show-all-columns-names-on-a-large-pandas-dataframe) ? – jezrael Sep 16 '21 at 08:23