Following the answer here and and here. I first change the dataframe to a time object
data['start'] = pd.to_datetime(data_session['start'], format = '%H:%M:%S').dt.time
data['end'] = pd.to_datetime(data['end'], format = '%H:%M:%S').dt.time
data['minutes'] = (data['end'] - data['start']).dt.minutes
data['Hour'] = data['start'].dt.hour
I get this error:
Error:TypeError: unsupported operand type(s) for -: 'datetime.time' and 'datetime.time'
I checked what the data frame info as:
data.info()
start 10000 non-null object
end 10000 non-null object
The column is still an object type. Why doesn't it convert to datetime64? Why am I not able to access it using the dt accessor?
My last try was:
data['start'] = pd.to_datetime(data_session['start'], format = '%H:%M:%S')
data['end'] = pd.to_datetime(data['end'], format = '%H:%M:%S')
data['minutes'] = (data['end'] - data['start'])
data.info()
start 10000 non-null datetime64[ns]
end 10000 non-null datetime64[ns]
This solution worked partially as I got the time difference but my start and end column had an additional date included.
e.g: 06:10:10 -> 1900-01-01 06:10:10
My goals are:
- Make a new column with only the hour of one of the series
- Make a new column with time difference in minutes