1

I have a dataframe as below:

CallID StorageDate CloseDate Time Delta
1 2023-02-08 14:35:09 2023-02-08 14:35:56
1 2023-02-08 14:35:56 2023-02-08 14:42:00 value
2 2023-02-07 10:17:18 2023-02-07 10:22:23
2 2023-02-07 10:22:23 2023-02-07 15:09:14
2 2023-02-07 15:09:14 2023-02-07 16:20:50
2 2023-02-07 16:20:49 2023-02-08 09:23:16
2 2023-02-08 09:23:16 2023-02-08 09:27:21 value
3 2023-03-10 10:31:25 2023-03-10 10:41:37
3 2023-03-10 10:41:37 2023-03-10 14:23:18 value

To achieve the Time Delta, I am doing the following:

delta_time = a.iloc[-1]['CloseDate'] - a.iloc[0]['StorageDate']

I need to subtract the last CloseDate from the first StorageDate for each CallID (a total of 16821), and the delta_time must go in the last row of each CallID, where there is value (the same as I get the CloseDate from).

I'm doing as follows:

callid = 1
while callid <= 16821:
    df1 = df1[df1['CallID'] == callid]
    delta_time = df1.iloc[-1]['CloseDate'] - df1.iloc[0]['StorageDate']
    callid += 1

But the problem is that I'm not being abble to parse the delta_time value to the correct row.

Before I tried doing with loc and iloc, and I managed to send it to the correct row in df1 with the following structure:

 delta_time = df1.iloc[-1]['CloseDate'] - df1.iloc[0]['StorageDate']
 df1.loc[1, 'Time Delta'] = delta_time

It works, but it's unefficient since I have to change the value inside the loc for every different CallID and iloc[-1] doesn't seem to work. Moreover, I don't know how to parse it to the main dataframe and not only the one I created to do the math.

Can anybody help me here?

2 Answers2

2

Use groupby.transform and where:

df[['StorageDate', 'CloseDate']] = df[['StorageDate', 'CloseDate']].apply(pd.to_datetime)

g = df.groupby('CallID')

df['Time Delta'] = (g['CloseDate'].transform('last')
                    .sub(g['StorageDate'].transform('first'))
                    .where(~df['CallID'].duplicated(keep='last'))
                   )

Output:

   CallID         StorageDate           CloseDate      Time Delta
0       1 2023-02-08 14:35:09 2023-02-08 14:35:56             NaT
1       1 2023-02-08 14:35:56 2023-02-08 14:42:00 0 days 00:06:51
2       2 2023-02-07 10:17:18 2023-02-07 10:22:23             NaT
3       2 2023-02-07 10:22:23 2023-02-07 15:09:14             NaT
4       2 2023-02-07 15:09:14 2023-02-07 16:20:50             NaT
5       2 2023-02-07 16:20:49 2023-02-08 09:23:16             NaT
6       2 2023-02-08 09:23:16 2023-02-08 09:27:21 0 days 23:10:03
7       3 2023-03-10 10:31:25 2023-03-10 10:41:37             NaT
8       3 2023-03-10 10:41:37 2023-03-10 14:23:18 0 days 03:51:53

Reproducible input:

df = pd.DataFrame({'CallID': [1, 1, 2, 2, 2, 2, 2, 3, 3],
                   'StorageDate': ['2023-02-08 14:35:09', '2023-02-08 14:35:56', '2023-02-07 10:17:18', '2023-02-07 10:22:23', '2023-02-07 15:09:14', '2023-02-07 16:20:49', '2023-02-08 09:23:16', '2023-03-10 10:31:25', '2023-03-10 10:41:37'],
                   'CloseDate': ['2023-02-08 14:35:56', '2023-02-08 14:42:00', '2023-02-07 10:22:23', '2023-02-07 15:09:14', '2023-02-07 16:20:50', '2023-02-08 09:23:16', '2023-02-08 09:27:21', '2023-03-10 10:41:37', '2023-03-10 14:23:18']})

df[['StorageDate', 'CloseDate']] = df[['StorageDate', 'CloseDate']].apply(pd.to_datetime)
mozway
  • 194,879
  • 13
  • 39
  • 75
  • In my case, even the ones that should have a value appear as NaT. – Gustavo Felippe de Oliveira Mar 15 '23 at 13:13
  • @Gustavo I added a reproducible input, can you first check that this works with that? If not, please provide your pandas version. If this does work but not your real data, please update the example – mozway Mar 15 '23 at 13:16
  • Is it supposed to print the dataframe? If so, yes, it worked. – Gustavo Felippe de Oliveira Mar 15 '23 at 13:17
  • @Gustavo can you test my code on it. The other answer used the exact same logic as mine, so I'm very surprised that this wouldn't work. Make sure to use the latest version – mozway Mar 15 '23 at 13:18
  • now it worked just fine :) Thanks! Another question: how can I change time delta to only have data as hh:mm:ss and not "x days hh:mm:ss"? – Gustavo Felippe de Oliveira Mar 15 '23 at 13:22
  • @Gustavo it depends if you can have a timedelta greater than 24h. Is this the case? Then do you want to truncate or have hours greater than 24? There is already a question on that – mozway Mar 15 '23 at 13:25
  • @Gustavo found it: https://stackoverflow.com/questions/538666/format-timedelta-to-string – mozway Mar 15 '23 at 13:27
1

Use Series.duplicated for filter last rows generated by GroupBy.transform:

m = ~df['CallID'].duplicated(keep='last')

g = df.groupby('CallID')

df.loc[m, 'Time Delta'] = (g['CloseDate'].transform('last')[m]
                                         .sub(g['StorageDate'].transform('first')[m]))
print (df)
   CallID         StorageDate           CloseDate       Time Delta
0       1 2023-02-08 14:35:09 2023-02-08 14:35:56              NaN
1       1 2023-02-08 14:35:56 2023-02-08 14:42:00  0 days 00:06:51
2       2 2023-02-07 10:17:18 2023-02-07 10:22:23              NaN
3       2 2023-02-07 10:22:23 2023-02-07 15:09:14              NaN
4       2 2023-02-07 15:09:14 2023-02-07 16:20:50              NaN
5       2 2023-02-07 16:20:49 2023-02-08 09:23:16              NaN
6       2 2023-02-08 09:23:16 2023-02-08 09:27:21  0 days 23:10:03
7       3 2023-03-10 10:31:25 2023-03-10 10:41:37              NaN
8       3 2023-03-10 10:41:37 2023-03-10 14:23:18  0 days 03:51:53

Another solution with aggregate by GroupBy.agg with mapping difference:

def f(x):
    ts = x.total_seconds()
    hours, remainder = divmod(ts, 3600)
    minutes, seconds = divmod(remainder, 60)
    return ('{:02d}:{:02d}:{:02d}').format(int(hours), int(minutes), int(seconds))

m = ~df['CallID'].duplicated(keep='last')

df1 = df.groupby('CallID').agg({'CloseDate':'last', 'StorageDate':'first'})

df.loc[m, 'Time Delta'] = (df.loc[m, 'CallID'].map(df1['CloseDate'].sub(df1['StorageDate']))
                             .apply(f))
print (df)
   CallID         StorageDate           CloseDate Time Delta
0       1 2023-02-08 14:35:09 2023-02-08 14:35:56        NaN
1       1 2023-02-08 14:35:56 2023-02-08 14:42:00   00:06:51
2       2 2023-02-07 10:17:18 2023-02-07 10:22:23        NaN
3       2 2023-02-07 10:22:23 2023-02-07 15:09:14        NaN
4       2 2023-02-07 15:09:14 2023-02-07 16:20:50        NaN
5       2 2023-02-07 16:20:49 2023-02-08 09:23:16        NaN
6       2 2023-02-08 09:23:16 2023-02-08 09:27:21   23:10:03
7       3 2023-03-10 10:31:25 2023-03-10 10:41:37        NaN
8       3 2023-03-10 10:41:37 2023-03-10 14:23:18   03:51:53
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252