extracting days from a numpy.timedelta64 value

Question

I am using pandas/python and I have two date time series s1 and s2, that have been generated using the 'to_datetime' function on a field of the df containing dates/times.

When I subtract s1 from s2

s3 = s2 - s1

I get a series, s3, of type

timedelta64[ns]

0    385 days, 04:10:36
1     57 days, 22:54:00
2    642 days, 21:15:23
3    615 days, 00:55:44
4    160 days, 22:13:35
5    196 days, 23:06:49
6     23 days, 22:57:17
7      2 days, 22:17:31
8    622 days, 01:29:25
9     79 days, 20:15:14
10    23 days, 22:46:51
11   268 days, 19:23:04
12                  NaT
13                  NaT
14   583 days, 03:40:39

How do I look at 1 element of the series:

s3[10]

I get something like this:

numpy.timedelta64(2069211000000000,'ns')

How do I extract days from s3 and maybe keep them as integers(not so interested in hours/mins etc.)?

just FYI, about to merge into pandas master this functionaility: https://github.com/pydata/pandas/pull/4534 (you can do this on 0.12 and before by: ``s.apply(lambda x: x / np.timedelta64(1,'D'))`` — Jeff, Aug 13 '13 at 17:47
Even if you're not interested in hours it might be relevant whether 2 days 23:59 is mapped to 2 days or to 3? — Rriskit, Nov 17 '22 at 12:44

Viktor Kerkez · Accepted Answer · 2013-08-13T19:34:01.900

191

You can convert it to a timedelta with a day precision. To extract the integer value of days you divide it with a timedelta of one day.

>>> x = np.timedelta64(2069211000000000, 'ns')
>>> days = x.astype('timedelta64[D]')
>>> days / np.timedelta64(1, 'D')
23

Or, as @PhillipCloud suggested, just days.astype(int) since the timedelta is just a 64bit integer that is interpreted in various ways depending on the second parameter you passed in ('D', 'ns', ...).

You can find more about it here.

edited Aug 13 '13 at 19:34

answered Aug 13 '13 at 17:28

Viktor Kerkez

45,070
12
104
85

20

You can also do `days.item().days` or `days.astype(int)`. – Phillip Cloud Aug 13 '13 at 17:35
1

more recent versions of pandas support a full fledged Timedelta type, see docs here: http://pandas.pydata.org/pandas-docs/stable/timedeltas.html – Jeff Feb 25 '15 at 00:24
This is a good candidate for .apply. You can do this in the same line where you compute column values by putting .apply(lambda x: x/np.timedelta64(1,'D')) at the end to apply the conversion at the column level. e.g. s3=(s1-s2).apply(lambda x: x/np.timedelta64(1,'D')). – Ezekiel Kruglick Nov 13 '15 at 00:01
4

This method `astype('timedelta64[D]')`(about 96ms) is much more efficient than `dt.days.`(about 24s) for 4,000,000 rows. – Pengju Zhao Jul 13 '17 at 02:00

score 66 · Answer 2 · answered Dec 19 '16 at 18:30

Use dt.days to obtain the days attribute as integers.

For eg:

In [14]: s = pd.Series(pd.timedelta_range(start='1 days', end='12 days', freq='3000T'))

In [15]: s
Out[15]: 
0    1 days 00:00:00
1    3 days 02:00:00
2    5 days 04:00:00
3    7 days 06:00:00
4    9 days 08:00:00
5   11 days 10:00:00
dtype: timedelta64[ns]

In [16]: s.dt.days
Out[16]: 
0     1
1     3
2     5
3     7
4     9
5    11
dtype: int64

More generally - You can use the .components property to access a reduced form of timedelta.

In [17]: s.dt.components
Out[17]: 
   days  hours  minutes  seconds  milliseconds  microseconds  nanoseconds
0     1      0        0        0             0             0            0
1     3      2        0        0             0             0            0
2     5      4        0        0             0             0            0
3     7      6        0        0             0             0            0
4     9      8        0        0             0             0            0
5    11     10        0        0             0             0            0

Now, to get the hours attribute:

In [23]: s.dt.components.hours
Out[23]: 
0     0
1     2
2     4
3     6
4     8
5    10
Name: hours, dtype: int64

+1 - This is the best way to do this currently since the pandas package has progressed since this question was asked. — Austin A, Aug 15 '19 at 15:00

score 9 · Answer 3 · answered Nov 17 '15 at 23:02

Suppose you have a timedelta series:

import pandas as pd
from datetime import datetime
z = pd.DataFrame({'a':[datetime.strptime('20150101', '%Y%m%d')],'b':[datetime.strptime('20140601', '%Y%m%d')]})

td_series = (z['a'] - z['b'])

One way to convert this timedelta column or series is to cast it to a Timedelta object (pandas 0.15.0+) and then extract the days from the object:

td_series.astype(pd.Timedelta).apply(lambda l: l.days)

Another way is to cast the series as a timedelta64 in days, and then cast it as an int:

td_series.astype('timedelta64[D]').astype(int)

This was is quite fast as compared to other solutions. – gm1991 Oct 19 '22 at 22:13 — gm1991, Oct 19 '22 at 22:13

score 0 · Answer 4 · answered Nov 15 '22 at 13:21

First, convert the date time column in pandas date time by using:

## Convert time in pandas date time
df['Start'] = pd.to_datetime(df['Start'], errors='coerce')

Once that is done use the following command to subtract two dates:

df["Duration_after subtraction"] = (df['End_Time'] - df['Start_Time']   / np.timedelta64(1, 'm')

To convert into hour use 'h' instead of 'm'

extracting days from a numpy.timedelta64 value

4 Answers4

Linked

Related