I am at a standstill with this problem. I outlined it in another question ( Creating data histograms/visualizations using ipython and filtering out some values ) which meandered a bit so I'd like to fix the question and give it more context since I am sure others must have a workaround for this or have the problem. I've also seen similar, not identical, questions asked and can't quite adapt any of the solutions thus far given.
I have columns in my data frame for Start Time and End Time and created a 'Duration' column for time lapsed. I'm using ipython.
The Start Time/End Time columns have fields that look like:
2014/03/30 15:45
A date and then a time in hh:mm
when I type:
pd.to_datetime('End Time') and pd.to_datetime('Start Time')
I get fields resulting that look like:
2014-03-30 15:45:00
same date but with hyphens and same time but with :00 seconds appended
I then decided to create a new column for the difference between the End and Start times. The 'Duration' or time lapsed column was created by typing in one command:
df['Duration'] = pd.to_datetime(df['End Time'])-pd.to_datetime(df['Start Time'])
The format of the fields in the duration column is:
01:14:00
no date just a time lapsed in the format hh:mm:ss
to indicate time lapsed or 74 mins in the above example.
When I type:
df.Duration.dtype
dtype('m8[ns]') is returned, whereas, when I type
df.Duration.head(4)
0 00:14:00 1 00:16:00 2 00:03:00 3 00:09:00 Name: Duration, dtype: timedelta64[ns]
is returned which seems to indicate a different dtype for Duration.
How can I convert the format I have in the Duration column to a single integer value of minutes (time lapsed)? I see no methods that I can use, I'd write a function but wouldn't know how to treat the input of hh:mm:ss. This must be a common requirement of data analysis, should I be going about converting these dates and times differently if my end goal is to get a single integer indicating minutes lapsed? Should I just be using Excel?... because I have so far spent a day on this problem and it should be a simple problem to solve.
**update: THANK YOU!! (Jeff and Dataswede) I added a column with the command:
df['Durationendminusstart'] = pd.to_timedelta(df.Duration,unit='ns').astype('timedelta64[m]')
which seems to give me the Duration (minutes lapsed) as wanted so that huge part is solved! What still is not clear is why there were two different dtypes for the same column depending how I asked, oh well right now it doesn't matter.**