14

This answer explains how to convert integers to hourly timesteps in Pandas. I need to do the opposite.

My dataframe df1:

   A
0  02:00:00
1  01:00:00
2  02:00:00
3  03:00:00

My expected dataframe df1:

   A         B
0  02:00:00  2
1  01:00:00  1
2  02:00:00  2
3  03:00:00  3

What I am trying:

df1['B'] = df1['A'].astype(int)

This fails because: TypeError: cannot astype a timedelta from [timedelta64[ns]] to [int32]

What is the best way to do this?

EDIT

If I try df['B'] = df['A'].dt.hour, then I get: AttributeError: 'TimedeltaProperties' object has no attribute 'hour'

EdChum
  • 376,765
  • 198
  • 813
  • 562
FaCoffee
  • 7,609
  • 28
  • 99
  • 174

4 Answers4

22

You can use dt.components and access the hours column:

In[7]:
df['B'] = df['A'].dt.components['hours']
df

Out[7]: 
         A  B
0 02:00:00  2
1 01:00:00  1
2 02:00:00  2
3 03:00:00  3

the timedelta components returns each component as a column:

In[8]:
df['A'].dt.components

Out[8]: 
   days  hours  minutes  seconds  milliseconds  microseconds  nanoseconds
0     0      2        0        0             0             0            0
1     0      1        0        0             0             0            0
2     0      2        0        0             0             0            0
3     0      3        0        0             0             0            0
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • 1
    Thanks - this breakdown is very useful. I don't need it right now but I would leave it for others to benefit in the future. – FaCoffee Aug 30 '18 at 09:16
  • 1
    It's probably useful if someone wanted just minutes, milliseconds etc. – EdChum Aug 30 '18 at 09:17
17

Divide by np.timedelta64(1, 'h'):

df1['B'] = df1['A'] / np.timedelta64(1, 'h')
print (df1)
         A    B
0 02:00:00  2.0
1 01:00:00  1.0
2 02:00:00  2.0
3 03:00:00  3.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
2

Both solutions - dt.components or np.timedelta64 - are useful. But the np.timedelta64 is (1) much faster than dt.components (good to know especially for large dataframes) (2, as @Sam Chats noted) also takes into account the difference in days.

For time comparison:

import pandas as pd
import numpy as np

dct = { 
      'date1': ['08:05:23', '18:07:20', '08:05:23'],
      'date2': ['09:15:24', '22:07:20', '08:54:01']
      }
df = pd.DataFrame(dct)
df['date1'] = pd.to_datetime(df['date1'], format='%H:%M:%S')
df['date2'] = pd.to_datetime(df['date2'], format='%H:%M:%S')
df['delta'] = df['date2']-df['date1']

%timeit df['np_h'] = (df['delta'] / np.timedelta64(1,'h')).astype(int)
%timeit df['td_h'] = df['delta'].dt.components['hours']

Output:
1000 loops, best of 3: 484 µs per loop
1000 loops, best of 3: 1.43 ms per loop
dcneuro
  • 181
  • 2
  • 6
Lukas
  • 2,034
  • 19
  • 27
  • 3
    I don't think `components` performs the `dt.components['days']*24` multiplication. It just returns the `hours` component from the `TimedeltaProperties` object. – Sam Chats Apr 17 '19 at 14:38
1

Alternatively divide by pd.Timedelta(1, 'h'):

df1['B'] = df1['A'] / pd.Timedelta(1, 'h')

The result is float.

https://pandas.pydata.org/docs/reference/api/pandas.Timedelta.html

Wtower
  • 18,848
  • 11
  • 103
  • 80