0

I am at a standstill with this problem. I outlined it in another question ( Creating data histograms/visualizations using ipython and filtering out some values ) which meandered a bit so I'd like to fix the question and give it more context since I am sure others must have a workaround for this or have the problem. I've also seen similar, not identical, questions asked and can't quite adapt any of the solutions thus far given.

I have columns in my data frame for Start Time and End Time and created a 'Duration' column for time lapsed. I'm using ipython.

The Start Time/End Time columns have fields that look like:

2014/03/30 15:45

A date and then a time in hh:mm

when I type:

pd.to_datetime('End Time') and pd.to_datetime('Start Time')

I get fields resulting that look like:

2014-03-30 15:45:00

same date but with hyphens and same time but with :00 seconds appended

I then decided to create a new column for the difference between the End and Start times. The 'Duration' or time lapsed column was created by typing in one command:

df['Duration'] = pd.to_datetime(df['End Time'])-pd.to_datetime(df['Start Time'])

The format of the fields in the duration column is:

01:14:00

no date just a time lapsed in the format hh:mm:ss

to indicate time lapsed or 74 mins in the above example.

When I type:

df.Duration.dtype

dtype('m8[ns]') is returned, whereas, when I type

df.Duration.head(4)

0 00:14:00 1 00:16:00 2 00:03:00 3 00:09:00 Name: Duration, dtype: timedelta64[ns]

is returned which seems to indicate a different dtype for Duration.

How can I convert the format I have in the Duration column to a single integer value of minutes (time lapsed)? I see no methods that I can use, I'd write a function but wouldn't know how to treat the input of hh:mm:ss. This must be a common requirement of data analysis, should I be going about converting these dates and times differently if my end goal is to get a single integer indicating minutes lapsed? Should I just be using Excel?... because I have so far spent a day on this problem and it should be a simple problem to solve.

**update: THANK YOU!! (Jeff and Dataswede) I added a column with the command:

df['Durationendminusstart'] = pd.to_timedelta(df.Duration,unit='ns').astype('timedelta64[m]')

which seems to give me the Duration (minutes lapsed) as wanted so that huge part is solved! What still is not clear is why there were two different dtypes for the same column depending how I asked, oh well right now it doesn't matter.**

Community
  • 1
  • 1
pythonista
  • 85
  • 1
  • 2
  • 9
  • read the docs here: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#time-deltas – Jeff Aug 05 '14 at 17:42
  • I've read the docs, could you tell me how that helps? If A1 = 01:14:00 (which seems to be two different dtypes depending on how I ask), what am I to do to get 74? – pythonista Aug 05 '14 at 17:55
  • see the section under frequency conversion; its doing exactly what you want. http://pandas.pydata.org/pandas-docs/stable/timeseries.html#time-deltas-reductions – Jeff Aug 05 '14 at 17:56
  • Check out the answers provided to a similar question I asked: http://stackoverflow.com/questions/23543909/plotting-pandas-timedelta – DataSwede Aug 05 '14 at 17:59

0 Answers0