0

I have a following problem. I would like to calculate number of business days between two dates. Example:

import numpy as np

pokus = {"start_date" : "2022-01-01 10:00:00" , "end_date" : "2022-01-01 17:00:00" }
df = pd.DataFrame(pokus, index=[0])
cas_df["bus_days"] = np.busday_count(pd.to_datetime(df["start_date"]) , pd.to_datetime(df["end_date"]))

Which returns a confusing error:

Traceback (most recent call last):
  File "/home/vojta/.local/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3251, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-17-8910d714721b>", line 3, in <module>
    cas_df["bus_days"] = np.busday_count(pd.to_datetime(df["start_date"]) , pd.to_datetime(df["end_date"]))
  File "<__array_function__ internals>", line 180, in busday_count
TypeError: Iterator operand 0 dtype could not be cast from dtype('<M8[ns]') to dtype('<M8[D]') according to the rule 'safe'

How can I fix it please?

Sunderam Dubey
  • 1
  • 11
  • 20
  • 40
vojtam
  • 1,157
  • 9
  • 34

2 Answers2

2

np.busday_count accepts only datetime64[D], but pandas Dataframes and Series can only hold datetime64[ns], as explained in this answer.

So what we can do is convert the start and end date columns to a numpy array (as type datetime64[D]), and then pass these values to np.busday_count:

days = df[['start_date', 'end_date']].to_numpy(dtype='datetime64[D]')
cas_df["bus_days"] = np.busday_count(days[:, 0], days[:, 1])

( you could also use .to_numpy().astype('datetime64[D]') )

Vladimir Fokow
  • 3,728
  • 2
  • 5
  • 27
1

Try this:

cas_df["bus_days"] = np.busday_count(pd.to_datetime(df["start_date"]).values.astype('datetime64[D]') , pd.to_datetime(df["end_date"]).values.astype('datetime64[D]'))
Sunderam Dubey
  • 1
  • 11
  • 20
  • 40
YfNk
  • 94
  • 5
  • Works. Note that .to_numpy() is preferrable to .values() – Vladimir Fokow Aug 25 '22 at 09:31
  • @VladimirFokow depends what you want to do. If you don't need a copy `.values` is preferable, here this is the case as we don't keep the object after the call to `busday_count`. What shouldn't be done is `a = df['col'].values`, then try to modify `a` – mozway Aug 25 '22 at 09:49
  • @mozway, 1) well, [this answer](https://stackoverflow.com/a/54508052/14627505) explains why to always prefer `.to_numpy()` over `.values`, if I didn't miss something. 2) Plus, `.to_numpy()` has a parameter `copy=False` -- so it not always makes a copy. 3) And here, since we are casting to `'datetime64[D]'`, aren't we making a copy anyway? – Vladimir Fokow Aug 25 '22 at 10:01