I have a data frame with data:
- Transaction group - superior
- Application number - could be different in one transaction group
- Created time – data time
- Status - application or calculation.
And I want to calculate time between Created time for each pair in status application – calculation. Data doesn’t have column ‘pair’ , to be better understand I labeled them A, B, C in order.
pairs Transaction group Application number Created time Status
A 221110000363 HB202211100902000100 2022-11-10 09:05 application
A 221110000363 HB202211100902000100 2022-11-10 13:42 calculation
B 221110000363 HB202211100902000200 2022-11-10 14:02 application
B 221110000363 HB202211100902000200 2022-11-10 14:07 calculation
C 221110000363 HB202211100902000200 2022-11-10 14:43 application
C 221110000363 HB202211100902000200 2022-11-10 15:03 calculation
I was trying to split it on two df: one with all applications for transaction group - df1:
pairs Transaction group Application number start_time Status
A 221110000363 HB202211100902000100 2022-11-10 09:05 application
B 221110000363 HB202211100902000200 2022-11-10 14:02 application
C 221110000363 HB202211100902000200 2022-11-10 14:43 application
and another with all calculations - df2:
pairs Transaction group Application number stop_time Status
A 221110000363 HB202211100902000100 2022-11-10 13:42 calculation
B 221110000363 HB202211100902000200 2022-11-10 14:07 calculation
C 221110000363 HB202211100902000200 2022-11-10 15:03 calculation
Created time in df1 I renamed to ‘statrt_time’ and in df2 to ‘stop_time’ But then I don’t know how to merge this two data frames that in one row I will have one pair with two times – start and stop. It is also important that it must be group by ‘Transaction group’ and ‘Application number’.
I’m looking for result like that:
pairs Transaction group Application number start_time stop_time
A 221110000363 HB202211100902000100 2022-11-10 09:05 2022-11-10 13:42
B 221110000363 HB202211100902000200 2022-11-10 14:02 2022-11-10 14:07
C 221110000363 HB202211100902000200 2022-11-10 14:43 2022-11-10 15:03
EDIT:
Logic to pairs:
In one transaction group I'm looking for application number and for this I'm looking for first lowest datetime in column ‘Status’ where the value is ‘application’ and then the first lowest datetime where in column ‘Status’ value is ‘calculation’. This is the first pair. Second pair is second lowest datetime in Status - application and then the second lowest datetime in Status - calculation. This is the second pair.
EDIT2
Data:
Application number Created time Transaction group Status
HB202211100902000100 2022-11-10 09:05 221110000363 application
HB202211100902000100 2022-11-10 13:42 221110000363 calculation
HB202211100902000100 2022-11-10 14:02 221110000363 application
HB202211100902000100 2022-11-10 14:07 221110000363 calculation
HB202211100902000100 2022-11-10 14:43 221110000363 application
HB202211100902000100 2022-11-10 15:03 221110000363 calculation
HB202211100902000100 2022-11-10 16:24 221110000363 application
HB202205152239000200 2022-05-15 22:53 220515000252 application
HB202205152239000200 2022-05-16 14:57 220515000252 calculation
HB202205152253000100 2022-05-15 22:56 220515000252 application
HB202205152253000100 2022-05-16 14:56 220515000252 calculation
HB202205152257000100 2022-05-15 22:57 220515000252 application
HB202205152257000100 2022-05-16 14:56 220515000252 calculation
HB202205152259000100 2022-05-15 23:00 220515000252 application
HB202205152259000100 2022-05-16 14:56 220515000252 calculation
HB202205152301000100 2022-05-15 23:02 220515000252 application
HB202205152301000100 2022-05-16 14:56 220515000252 calculation
HB202205152302000100 2022-05-15 23:04 220515000252 application
HB202205152302000100 2022-05-16 14:55 220515000252 calculation
HB202205152305000100 2022-05-15 23:07 220515000252 application
HB202205152305000100 2022-05-16 14:55 220515000252 calculation
HB202205152307000100 2022-05-15 23:09 220515000252 application
HB202205152307000100 2022-05-16 14:54 220515000252 calculation
HB202205152312000100 2022-05-15 23:13 220515000252 application
HB202205152312000100 2022-05-16 14:54 220515000252 calculation
HB202205152313000100 2022-05-15 23:17 220515000252 application
HB202205152313000100 2022-05-16 14:54 220515000252 calculation
So I use this:
> df['pairs'] = df.groupby(['Status']).cumcount()
And my data looks like :
Application number Created time Transaction group Status pairs
0 HB202211100902000100 2022-11-10 09:05:28 221110000363 application 0
1 HB202211100902000100 2022-11-10 13:42:44 221110000363 calculation 0
2 HB202211100902000100 2022-11-10 14:02:43 221110000363 application 1
3 HB202211100902000100 2022-11-10 14:07:16 221110000363 calculation 1
4 HB202211100902000100 2022-11-10 14:43:54 221110000363 application 2
5 HB202211100902000100 2022-11-10 15:03:06 221110000363 calculation 2
6 HB202211100902000100 2022-11-10 16:24:26 221110000363 application 3
7 HB202205152239000200 2022-05-15 22:53:18 220515000252 application 4
8 HB202205152239000200 2022-05-16 14:57:14 220515000252 calculation 3
9 HB202205152253000100 2022-05-15 22:56:11 220515000252 application 5
10 HB202205152253000100 2022-05-16 14:56:55 220515000252 calculation 4
11 HB202205152257000100 2022-05-15 22:57:58 220515000252 application 6
12 HB202205152257000100 2022-05-16 14:56:47 220515000252 calculation 5
So the problem starts from 3rd pair.
After using pivot:
df = (df.pivot(index=['pairs', 'Transaction group', 'Application number'], columns='Status', values='Created time'))
Data looks that:
Status application calculation
pairs Transaction group Application number
0 221110000363 HB202211100902000100 2022-11-10 09:05:28 2022-11-10 13:42:44
1 221110000363 HB202211100902000100 2022-11-10 14:02:43 2022-11-10 14:07:16
2 221110000363 HB202211100902000100 2022-11-10 14:43:54 2022-11-10 15:03:06
3 220515000252 HB202205152239000200 NaT 2022-05-16 14:57:14
221110000363 HB202211100902000100 2022-11-10 16:24:26 NaT
4 220515000252 HB202205152239000200 2022-05-15 22:53:18 NaT
HB202205152253000100 NaT 2022-05-16 14:56:55
5 220515000252 HB202205152253000100 2022-05-15 22:56:11 NaT
HB202205152257000100 NaT 2022-05-16 14:56:47
So, in my opinion the most important thing is to make a good pairs.