9

I have a pandas dataframe with a column like:

In [96]: data['difference']
Out[96]: 
0                 NaT
1     1 days 21:34:30
2     0 days 16:57:36
3     0 days 00:16:51
4     0 days 15:52:38
5     0 days 14:19:34
6     0 days 02:54:46
7     1 days 04:21:28
8     0 days 01:58:55
9     0 days 10:30:35
10    0 days 07:53:04
....
Name: difference, dtype: timedelta64[ns]

I want to create next to it a column with integers corresponding to the days values in this column.

Blue Moon
  • 4,421
  • 20
  • 52
  • 91

3 Answers3

14

This should convert your timedelta64[ns] type to float64 representing days:

data['difference'].astype('timedelta64[D]')
Nhor
  • 3,860
  • 6
  • 28
  • 41
  • 3
    This method `astype('timedelta64[D]')`(about 96ms) is much more efficient than `dt.days.`(about 24s) for 4,000,000 rows. – Pengju Zhao Jul 13 '17 at 01:53
  • 1
    See [this Q&A](https://stackoverflow.com/q/76119709/10197418): `.astype('timedelta64[D]')` is **not supported anymore** in newer versions of pandas. – FObersteiner Apr 27 '23 at 11:29
9

You can use dt.days to extract just days from your series,

df.difference
Out[117]: 
0   -1 days +00:00:05
1                 NaT
2   -1 days +00:00:05
3     1 days 00:00:00
dtype: timedelta64[ns]

df.difference.dt.days
Out[118]: 
0    -1
1   NaN
2    -1
3     1
dtype: float64

All other component extracts,

dr
Out[93]: 
0   -1 days +00:00:05
1                 NaT
2     1 days 02:04:05
3     1 days 00:00:00
dtype: timedelta64[ns]

dr.dt.components
Out[95]: 
   days  hours  minutes  seconds  milliseconds  microseconds  nanoseconds
0    -1      0        0        5             0             0            0
1   NaN    NaN      NaN      NaN           NaN           NaN          NaN
2     1      2        4        5             0             0            0
3     1      0        0        0             0             0            0
WoodChopper
  • 4,265
  • 6
  • 31
  • 55
  • 1
    This method is pretty slow. In contrast, the method `astype('timedelta64[D]')`(about 96ms) is much more efficient than `dt.days.`(about 24s) for 4,000,000 rows. – Pengju Zhao Jul 13 '17 at 01:58
3

According to pandas documentation, you can extract days using astype method of timedelta64 object and the result is of type float64.

data['difference'].astype('timedelta64[D]')
Kenly
  • 24,317
  • 7
  • 44
  • 60