2

I have read in a csv into a dataframe like this:

import pandas as pd

data = pd.read_csv('mydata.csv', encoding = 'ISO-8859-1')

and the dataframe looks like this:

    first_time     second_time

1   11/2/17 0:05   11/2/17 0:09

2   12/2/17 0:40   12/2/17 0:17

3   13/2/17 0:12   13/2/17 1:40

And I would like to make a new column called duration, which finds the duration(minutes) of both columns on each row:

    first_time     second_time    duration

1   11/2/17 0:05   11/2/17 0:09   4

2   12/2/17 0:40   12/2/17 0:47   7

3   13/2/17 0:12   13/2/17 1:40   88

How could I do this efficiently? Normally I would try and parse the date, and extract the time values, but this doesn't seem very efficient.

Simon
  • 9,762
  • 15
  • 62
  • 119
RoadRunner
  • 25,803
  • 6
  • 42
  • 75

1 Answers1

3

You can just convert to datetime type and calculate the delta in minutes:

import pandas as pd

data = pd.DataFrame({'first_time': ['11/2/17 0:05', '12/2/17 0:40', '13/2/17 0:12'],
                     'second_time': ['11/2/17 0:09', '12/2/17 0:47', '13/2/17 1:40']})

data['duration'] = pd.to_datetime(data['second_time']) - pd.to_datetime(data['first_time'])
data['duration'] = data['duration'].astype('timedelta64[m]')

Which will give:

     first_time   second_time  duration
0  11/2/17 0:05  11/2/17 0:09       4.0
1  12/2/17 0:40  12/2/17 0:47       7.0
2  13/2/17 0:12  13/2/17 1:40      88.0
Simon
  • 9,762
  • 15
  • 62
  • 119