49

I want to pass a datetime array to a Numba function (which cannot be vectorised and would otherwise be very slow). I understand Numba supports numpy.datetime64. However, it seems it supports datetime64[D] (day precision) but not datetime64[ns] (nanosecond precision) (I learnt this the hard way: is it documented?).

I tried to convert from datetime64[ns] to datetime64[D], but can't seem to find a way!

I have summarised my problem with the minimal code below. If you run testdf(mydates), which is datetime64[D], it works fine. If you run testdf(dates_input), which is datetime64[ns], it doesn't. Note that this example simply passes the dates to the Numba function, which doesn't (yet) do anything with them. I try to convert dates_input to datetime64[D], but the conversion doesn't work. In my original code I read from a SQL table into a pandas dataframe, and need a column which changes the day of each date to the 15th.

import numba
import numpy as np
import pandas as pd
import datetime

mydates =np.array(['2010-01-01','2011-01-02']).astype('datetime64[D]')
df=pd.DataFrame()
df["rawdate"]=mydates
df["month_15"] = df["rawdate"].apply(lambda r: datetime.date( r.year, r.month,15 ) )

dates_input = df["month_15"].astype('datetime64[D]')
print dates_input.dtype # Why datetime64[ns] and not datetime64[D] ??


@numba.jit(nopython=True)
def testdf(dates):
    return 1

print testdf(mydates)

The error I get if I run testdf(dates_input) is:

numba.typeinfer.TypingError: Failed at nopython (nopython frontend)
Var 'dates' unified to object: dates := {pyobject}
Pythonista anonymous
  • 8,140
  • 20
  • 70
  • 112
  • 6
    This is a really useful question, but it was for some reason very difficult to find just through search. I received a similar error when trying to use `np.busday_count` on pandas data, which read: `TypeError: Iterator operand 0 dtype could not be cast from dtype(' – Michael K Mar 28 '16 at 14:18
  • The easiest thing to do is probably to pass a numeric representation of datetime (e.g. Unix time) to the numba jit-compiled function. – FObersteiner May 30 '23 at 19:01

3 Answers3

62

Note (2023-05-30): This answer only works for pandas version <2. Pandas 2.0.0 was released on 2023-04-03. See relevant changelog entry.

Series.astype converts all date-like objects to datetime64[ns].

To convert to datetime64[D], use values to obtain a NumPy array before calling astype:

dates_input = df["month_15"].values.astype('datetime64[D]')

Note that NDFrames (such as Series and DataFrames) can only hold datetime-like objects as objects of dtype datetime64[ns]. The automatic conversion of all datetime-likes to a common dtype simplifies subsequent date computations. But it makes it impossible to store, say, datetime64[s] objects in a DataFrame column. Pandas core developer, Jeff Reback explains,

"We don't allow direct conversions because its simply too complicated to keep anything other than datetime64[ns] internally (nor necessary at all)."


Also note that even though df['month_15'].astype('datetime64[D]') has dtype datetime64[ns]:

In [29]: df['month_15'].astype('datetime64[D]').dtype
Out[29]: dtype('<M8[ns]')

when you iterate through the items in the Series, you get pandas Timestamps, not datetime64[ns]s.

In [28]: df['month_15'].astype('datetime64[D]').tolist()
Out[28]: [Timestamp('2010-01-15 00:00:00'), Timestamp('2011-01-15 00:00:00')]

Therefore, it is not clear that Numba actually has a problem with datetime64[ns], it might just have a problem with Timestamps. Sorry, I can't check this -- I don't have Numba installed.

However, it might be useful for you to try

testf(df['month_15'].astype('datetime64[D]').values)

since df['month_15'].astype('datetime64[D]').values is truly a NumPy array of dtype datetime64[ns]:

In [31]: df['month_15'].astype('datetime64[D]').values.dtype
Out[31]: dtype('<M8[ns]')

If that works, then you don't have to convert everything to datetime64[D], you just have to pass NumPy arrays -- not Pandas Series -- to testf.

Cornelius Roemer
  • 3,772
  • 1
  • 24
  • 55
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • 1
    Thank you! May I ask why this is, though? I mean, I can't think of any logical reason why a date, created with only year, month and day, gets converted to millisecond precision, and cannot be converted back to day precision unless we call .values. Is it a bug? Or am I missing a fundamental reason here? Is it documented anywhere? My big frustration with Python for data analysis (which, yes, I know, is only one of the many things Python can do, but I'm not interested in the others!) is indeed the poor quality of the documentation, especially vs a commercial package like Matlab – Pythonista anonymous Aug 10 '15 at 11:28
  • Pandas does a lot of things for you which generally are convenient. Unfortunately, sometimes that means it ends up doing things which it thinks is what you want (like converting all dates to datetime64[ns]/Timestamps) when in fact you want something else. I don't know if this particular issue is documented somewhere. – unutbu Aug 10 '15 at 11:44
  • that's not really what's going on here, though. In my example, Pandas doesn't have to guess what precision I need (day or millisecond), because I explicitly tell Pandas (.astype(datetime64['D'] ). This sounds more like a bug – Pythonista anonymous Aug 10 '15 at 11:52
  • 7
    Somewhere along the line, Pandas made the decision to funnel all date-like data into one common data type: `datetime64[ns]`. There are advantages to doing this: It makes comparison and date arithmetic easier. A consequence of this is that **there are no Series of dtype `datetime64[D]`**. Maybe `df['month_15'].astype('datetime64[D]')` should raise an exception instead of silently converting to `datetime64[ns]`, but as long as Pandas maintains its funnel-everything-to-datetime64[ns] policy, `df['month_15'].astype('datetime64[D]')` is not going to return a Series of dtype `datetime64[D]`. – unutbu Aug 10 '15 at 12:11
  • By the way, `datetime64[ns]` has nanosecond precision, not millisecond precision. – unutbu Aug 10 '15 at 12:13
  • 4
    @Pythonistaanonymous: After this answer was written, "wesm" of Pandas wrote a detailed comment with some of the backstory and issues with supporting other datetime64 units, here: https://github.com/pandas-dev/pandas/issues/7307#issuecomment-224180563 - many of the comments on that issue are relevant here. – John Zwinck Feb 28 '17 at 09:40
  • This line was so useful - solved my whole problem: ```you don't have to convert everything to datetime64[D], you just have to pass NumPy arrays -- not Pandas Series -- to testf.``` – sharon Sep 21 '21 at 23:37
  • Note: this answer does not work with pandas version >=2 (released 2023-04-03) – Cornelius Roemer May 30 '23 at 10:08
1

Ran into the same error when calculating number of business days between two dates:

from pandas.tseries.offsets import MonthBegin
import numpy as np 

# Calculate the beginning of the month from a given date
df['Month_Begin'] = pd.to_datetime(df['MyDateColumn'])+ MonthBegin(-1)

# Calculate # of Business Days
# Convert dates to string to prevent type error [D]
df['TS_Period_End_Date'] = df['TS_Period_End_Date'].dt.strftime('%Y-%m-%d')
df['Month_Begin'] = df['Month_Begin'].dt.strftime('%Y-%m-%d')

df['Biz_Days'] = np.busday_count(df['Month_Begin'], df['MyDateColumn']) #<-- Error if not converted into strings.

My workaround was to convert the dates using ".dt.strftime(''%Y-%m-%d')". It worked in my particular case.

Arthur D. Howland
  • 4,363
  • 3
  • 21
  • 31
0

Numpy datetime64 objects supports different resolution levels which have a corresponding Python datetime object. For example, datetime64[us] can be converted to datetime.datetime, datetime64[D] to datetime.date etc. So wherever, datetime64[D] is needed, Python's datetime.date can be used; same with datetime64[us] and datetime.datetime. Sadly, Python's datetime doesn't support nanosecond resolution, so datetime64[ns] becomes integers.

So if you got an error that says <M8[ns] cannot be cast to <M8[D], of which an example is:

TypeError: Iterator operand 0 dtype could not be cast from 
dtype('<M8[ns]') to dtype('<M8[D]') according to the rule 'safe'

then try viewing the array with the appropriate resolution (similar to converting a datetime.datetime to datetime.date):

x = np.arange('2020-01-01','2020-01-05', dtype='datetime64[D]')
y = np.arange('2020-01-01','2020-01-05', 10**9*3600*24, dtype='datetime64[ns]')
np.busday_count(x, y)                         # <---- error
np.busday_count(x, y.view('datetime64[D]'))   # <---- OK
#                   ^^^^^^^^^^^^^^^^^^^^^^    # view with a different resolution

If the data comes from a pandas dataframe as in the OP1, then there's dt.date to convert the values to datetime.date objects; make sure to convert to a list, so that datetime.date objects can be used as they are.2

df = pd.DataFrame({'x': x})
df['x'].dtype                                 # dtype('<M8[ns]')

np.is_busday(df['x'])                         # error
np.is_busday(df['x'].dt.date.tolist())        # OK
#                   ^^^^^^^^^^^^^^^^^         # convert to a list of datetime.date objects

1 As @unutbu mentions, pandas only supports datetime64 in nanosecond resolution, so datetime64[D] in a numpy array becomes datetime64[ns] when stored in a pandas column.

2 datetime.date is not a supported dtype in pandas, so any column/Series storing them becomes object dtype, which won't do if a function expects datetime64[D] or datetime.date type objects. So they must be converted into a list, so that each item can be read in as datetime.date.

cottontail
  • 10,268
  • 18
  • 50
  • 51