1

Following the answer here and and here. I first change the dataframe to a time object

data['start'] = pd.to_datetime(data_session['start'], format = '%H:%M:%S').dt.time

data['end'] = pd.to_datetime(data['end'], format = '%H:%M:%S').dt.time
data['minutes'] = (data['end'] - data['start']).dt.minutes
data['Hour'] = data['start'].dt.hour

I get this error:

Error:TypeError: unsupported operand type(s) for -: 'datetime.time' and 'datetime.time'

I checked what the data frame info as:

data.info()
    start    10000 non-null object
    end      10000 non-null object

The column is still an object type. Why doesn't it convert to datetime64? Why am I not able to access it using the dt accessor?

My last try was:

data['start'] = pd.to_datetime(data_session['start'], format = '%H:%M:%S')
data['end'] = pd.to_datetime(data['end'], format = '%H:%M:%S')
data['minutes'] = (data['end'] - data['start'])

data.info()
    start    10000 non-null datetime64[ns]
    end      10000 non-null datetime64[ns]

This solution worked partially as I got the time difference but my start and end column had an additional date included.

e.g: 06:10:10 -> 1900-01-01 06:10:10

My goals are:

  • Make a new column with only the hour of one of the series
  • Make a new column with time difference in minutes
may
  • 1,073
  • 4
  • 14
  • 31

2 Answers2

1

I think need convert to_timedeltas and then convert to minutes and hours:

data = pd.DataFrame({'end':['12:01:04','15:21:00'],
                     'start':['10:01:04','5:41:00']})

data['start'] = pd.to_timedelta(data['start'])
data['end'] = pd.to_timedelta(data['end'])

data['minutes'] = (data['end'] - data['start']).dt.total_seconds() / 60
data['Hour'] = data['start'].astype('timedelta64[h]').astype(int)
print (data)
       end    start  minutes  Hour
0 12:01:04 10:01:04      120    10
1 15:21:00 05:41:00      580     5
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • but the minutes are not calculate correct when the time is switching. For example: 23:59:44 to 00:01:41 the minutes are -1439, which is wrong – may May 14 '18 at 17:22
  • 1
    @may - You are right. Minutes was wrong, but answer was changed. Please check it. – jezrael May 14 '18 at 18:37
  • 1
    Thanks! I prefer your answer because is easier to follow. Thanks a lot! – may May 14 '18 at 19:27
1

This is one way using operator.attrgetter. Data from @jezrael.

from operator import attrgetter

for col in ['start', 'end']:
    data[col] = pd.to_timedelta(data[col])

data['minutes'] = (data['end'] - data['start']).apply(attrgetter('seconds')) / 60
data['hour'] = (data['start'].apply(attrgetter('seconds')) / 60**2).astype(int)

print(data)

       end    start  minutes  hour
0 12:01:04 10:01:04    120.0    10
1 15:21:00 05:41:00    580.0     5
jpp
  • 159,742
  • 34
  • 281
  • 339