Python 3 Pandas csv log file calculate an end time column when we only have start times

Question

This one is a little hard to explain but I have a log file with different start times with users sharing a device. Am trying to add a column that calculates the end time based on the next person's start time. Not sure what this concept is called but looks similar to Splitting time column into start time / end time columns except with python 3 instead of sql.

The dataframe looks like:

Team    Device_Num     User     Start_Time
A       1234           Joe      1/18/2019 8:36
A       1234           Sue      1/18/2019 9:16
A       1234           Mary     2/1/2019 12:26
A       1234           Joe      2/1/2019 14:04
B       3333           Jerry    1/17/2019 20:27
C       5555           Bob      1/15/2019 3:08
C       5555           Tim      1/17/2019 19:58
C       4444           Tim      1/18/2019 07:35

Trying to create a new column that assumes the end time based on the unique combination of the other columns, kind of like a library book so that:

 End_Time
 1/18/2019 9:15  #<-- Based on Sue's start time - 1 min
 2/1/2019 12:25  #<-- Based on Mary's start time - 1 min
 2/1/2019 14:03  #<-- Based on Joe's start time - 1 min
 today()         #<-- Joe still has it
 today()         #<-- Jerry still has it
 1/17/2019 19:57 #<-- Based on Tim's start time - 1 min
 today()         #<-- Tim still has it
 today()         #<-- Tim has this unique device.

Not sure where to begin on the coding, help is always appreciated.

score 0 · Answer 1 · answered Feb 22 '19 at 03:56

Continued to work on it and found the answer under a groupby/apply (shift) operation:

Sort df so that Device_Num is ascending and Start_time is ascending.

# Lag (groupby/shift) function
df['End_Time'] = df.groupby(['Device_Num'])['Start_Time'].shift(-1)

# Alternative to take into account other columns.
df['End_Time'] = df.groupby(['Device_Num', 'Team'])['Start_Time'].shift(-1)

# Offset time by 1 minute
df['End_Time'] = df.End_Time - pd.DateOffset(minutes = 1)

# Fill in blanks with now()
df.End_Time.fillna(pd.datetime.now().strftime("%m/%d/%Y %I:%M"), inplace=True)

Python 3 Pandas csv log file calculate an end time column when we only have start times

1 Answers1