0

I have a data frame as shown below

Doctor   Start                B_ID  Session      Finish
    A   2020-01-18 12:00:00     1    S1         2020-01-18 12:33:00
    A   2020-01-18 12:30:00     2    S1         2020-01-18 12:52:00
    A   2020-01-18 13:00:00     3    S1         2020-01-18 13:23:00
    A   2020-01-18 13:00:00     4    S1         2020-01-18 13:37:00
    A   2020-01-18 13:30:00     5    S1         2020-01-18 13:56:00
    A   2020-01-18 14:00:00     6    S1         2020-01-18 14:15:00
    A   2020-01-18 14:00:00     7    S1         2020-01-18 14:28:00
    A   2020-01-18 14:00:00     8    S1         2020-01-18 14:40:00
    A   2020-01-18 14:00:00     9    S1         2020-01-18 15:01:00
    A   2020-01-19 12:00:00    12    S2         2020-01-19 12:20:00
    A   2020-01-19 12:30:00    13    S2         2020-01-19 12:40:00 
    A   2020-01-19 13:00:00    14    S2         2020-01-19 13:20:00
    A   2020-01-19 13:30:00    15    S2         2020-01-19 13:40:00
    A   2020-01-19 14:00:00    16    S2         2020-01-19 14:10:00
    A   2020-01-19 14:00:00    17    S2         2020-01-19 14:20:00
    A   2020-01-19 14:00:00    19    S2         2020-01-19 14:40:00

From the above data frame I would like to create a column called "Actual_start_time" based on the previous finish and current Start time for the same session.

Steps

if previous finish > current start:
   df['Actual_start'] = df['previous finish']
else:
    df['Actual_start'] = df['Start']

Expected output:

Doctor   Start                B_ID  Session      Finish                 Actual_start
    A   2020-01-18 12:00:00     1    S1         2020-01-18 12:33:00   2020-01-18 12:00:00
    A   2020-01-18 12:30:00     2    S1         2020-01-18 12:52:00   2020-01-18 12:33:00
    A   2020-01-18 13:00:00     3    S1         2020-01-18 13:23:00   2020-01-18 13:00:00
    A   2020-01-18 13:00:00     4    S1         2020-01-18 13:37:00   2020-01-18 13:23:00
    A   2020-01-18 13:30:00     5    S1         2020-01-18 13:56:00   2020-01-18 13:37:00
    A   2020-01-18 14:00:00     6    S1         2020-01-18 14:15:00   2020-01-18 14:00:00
    A   2020-01-18 14:00:00     7    S1         2020-01-18 14:28:00   2020-01-18 14:15:00
    A   2020-01-18 14:00:00     8    S1         2020-01-18 14:40:00   2020-01-18 14:28:00
    A   2020-01-18 14:00:00     9    S1         2020-01-18 15:01:00   2020-01-18 14:40:00
    A   2020-01-19 12:00:00    12    S2         2020-01-19 12:20:00   2020-01-19 12:00:00
    A   2020-01-19 12:30:00    13    S2         2020-01-19 12:40:00   2020-01-19 12:30:00
    A   2020-01-19 13:00:00    14    S2         2020-01-19 13:20:00   2020-01-19 13:00:00
    A   2020-01-19 13:30:00    15    S2         2020-01-19 13:40:00   2020-01-19 13:30:00
    A   2020-01-19 14:00:00    16    S2         2020-01-19 14:10:00   2020-01-19 14:00:00
    A   2020-01-19 14:00:00    17    S2         2020-01-19 14:20:00   2020-01-19 14:10:00
    A   2020-01-19 14:00:00    19    S2         2020-01-19 14:40:00   2020-01-19 14:20:00
Danish
  • 2,719
  • 17
  • 32

1 Answers1

1

Use DataFrameGroupBy.shift and replace first values of group by column Start by Series.fillna:

df['Start'] = pd.to_datetime(df['Start'])
df['Finish'] = pd.to_datetime(df['Finish'])

s = df.groupby('Session')['Finish'].shift()
df['Actual_start'] = np.where(s.gt(df['Start']), s, df['Start'])

print (df)
   Doctor               Start  B_ID Session              Finish  \
0       A 2020-01-18 12:00:00     1      S1 2020-01-18 12:33:00   
1       A 2020-01-18 12:30:00     2      S1 2020-01-18 12:52:00   
2       A 2020-01-18 13:00:00     3      S1 2020-01-18 13:23:00   
3       A 2020-01-18 13:00:00     4      S1 2020-01-18 13:37:00   
4       A 2020-01-18 13:30:00     5      S1 2020-01-18 13:56:00   
5       A 2020-01-18 14:00:00     6      S1 2020-01-18 14:15:00   
6       A 2020-01-18 14:00:00     7      S1 2020-01-18 14:28:00   
7       A 2020-01-18 14:00:00     8      S1 2020-01-18 14:40:00   
8       A 2020-01-18 14:00:00     9      S1 2020-01-18 15:01:00   
9       A 2020-01-19 12:00:00    12      S2 2020-01-19 12:20:00   
10      A 2020-01-19 12:30:00    13      S2 2020-01-19 12:40:00   
11      A 2020-01-19 13:00:00    14      S2 2020-01-19 13:20:00   
12      A 2020-01-19 13:30:00    15      S2 2020-01-19 13:40:00   
13      A 2020-01-19 14:00:00    16      S2 2020-01-19 14:10:00   
14      A 2020-01-19 14:00:00    17      S2 2020-01-19 14:20:00   
15      A 2020-01-19 14:00:00    19      S2 2020-01-19 14:40:00   

          Actual_start  
0  2020-01-18 12:00:00  
1  2020-01-18 12:33:00  
2  2020-01-18 13:00:00  
3  2020-01-18 13:23:00  
4  2020-01-18 13:37:00  
5  2020-01-18 14:00:00  
6  2020-01-18 14:15:00  
7  2020-01-18 14:28:00  
8  2020-01-18 14:40:00  
9  2020-01-19 12:00:00  
10 2020-01-19 12:30:00  
11 2020-01-19 13:00:00  
12 2020-01-19 13:30:00  
13 2020-01-19 14:00:00  
14 2020-01-19 14:10:00  
15 2020-01-19 14:20:00  
Danish
  • 2,719
  • 17
  • 32
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • @ALI - Do you think last 3 values? I test it and it seems correct, because in last 3 values are shifted values less like `start` column, so not `True` in mask. Can you verify it? – jezrael Apr 08 '20 at 13:22
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/211220/discussion-between-ali-and-jezrael). – Danish Apr 08 '20 at 13:35