import pandas as pd
stoptimes_df = pd.DataFrame({
'trip_id': ['1', '1', '1', '2', '2', '2'],
'arrival_time': ["12:10:00", "12:20:00", "12:30:00", "27:32:00", "27:39:00", "27:45:00"],
'departure_time': ["12:10:00", "12:20:00", "12:30:00", "27:32:00", "27:39:00", "27:45:00"],
'stop_id': ['de:08437:48835:0:2', 'de:08426:6306', 'de:08426:6307', 'de:08116:6703', 'de:08116:3821', 'de:08415:28256:0:1']})
I have this dataframe given, which shows different bus lines (trip_id) and the different stops, and I would like to insert a new column which contains the difference between the arrival time of the following line and the departure time of the line before. Unfortunately I am not able to do this because when I change the datatype to datetime.time()
I can not calculate which the times. This is only possible if I use the datatype datetime.datetime()
, but then I have also a date in the columns "arrival_time" and "departure time" written, like "1900-01-01 12:10:00", which I do not want. I have a similar problem when I use timedelta. So the point is I want to keep only the times without a date in the two given columns and in the new column there should be the time difference in minutes or seconds. For example in the last line it should say in the new column 6 (min) or 300 (sec). Does someone know how to do this?
What I did so far in code:
def convert_to_datetime(time):
hours, minutes, seconds = map(int, time.split(':'))
hours = hours % 24 # change time format to 0-24 hours
time = str(hours) + ':' + str(minutes) + ':' + str(seconds)
time = datetime.strptime(time,"%H:%M:%S").time()
# time_delta = timedelta(hours=hours, minutes=minutes, seconds=seconds)
return time
stoptimes_df['arrival_time'] = stoptimes_df['arrival_time'].apply(convert_to_datetime)
stoptimes_df['departure_time'] = stoptimes_df['departure_time'].apply(convert_to_datetime)
stoptimes_df
# tried first with only one column to calculate
stoptimes_df['time_btw_stops'] = stoptimes_df.groupby('trip_id')['arrival_time'].diff()
stoptimes_df
This leads to the following error:
TypeError: unsupported operand type(s) for -: 'datetime.time' and 'datetime.time'