-1

I have tried to calculate the number of business days between two date (stored in separate columns in a dataframe ).

    MonthBegin  MonthEnd
0   2014-06-09  2014-06-30
1   2014-07-01  2014-07-31
2   2014-08-01  2014-08-31
3   2014-09-01  2014-09-30
4   2014-10-01  2014-10-31

I have tried to apply numpy.busday_count but I get the following error:

Iterator operand 0 dtype could not be cast from dtype('<M8[ns]') to dtype('<M8[D]') according to the rule 'safe'

I have tried to change the type into Timestamp as the following :

Timestamp('2014-08-31 00:00:00')

or datetime :

datetime.date(2014, 8, 31)

or to numpy.datetime64:

numpy.datetime64('2014-06-30T00:00:00.000000000')

Anyone knows how to fix it?

Note 1: I have passed tried np.busday_count in two way : 1. Passing dataframe columns, t['Days']=np.busday_count(t.MonthBegin,t.MonthEnd)

  1. Passing arrays np.busday_count(dt1,dt2)

Note2: My dataframe has over 150K rows so I need to use an efficient algorithm

Jeff Tehrani
  • 185
  • 1
  • 3
  • 8

4 Answers4

2

You can using bdate_range, also I corrected your input , since the most of MonthEnd is early than the MonthBegin

[len(pd.bdate_range(x,y))for x,y in zip(df['MonthBegin'],df['MonthEnd'])]
Out[519]: [16, 21, 22, 23, 20]
BENY
  • 317,841
  • 20
  • 164
  • 234
  • Since my dataframe has over 150K rows do you have a faster algorithm in your mind, this algorithm is really time consuming – Jeff Tehrani Sep 18 '18 at 14:02
  • @JeffTehrani `[np.busday_count(x[0],x[1]) for x in df.values.astype('datetime64[D]')] ` – BENY Sep 18 '18 at 14:06
0

You need to provide the template in which your dates are written.

 a = datetime.strptime('2014-06-9', '%Y-%m-%d')

Calculate this for your

b = datetime.strptime('2014-06-30', '%Y-%m-%d')

Now their difference

c = b-a
c.days 

which gives you the difference 21 days, You can now use list comprehension to get the difference between two dates as days. will give you datetime.timedelta(21), to convert it into days, just use

GraphicalDot
  • 2,644
  • 2
  • 28
  • 43
0

I think the best way to do is

df.apply(lambda row : np.busday_count(row['MBegin'],row['MEnd']),axis=1)

For my dataframe df as below:

      MBegin          MEnd
0   2011-01-01  2011-02-01
1   2011-01-10  2011-02-10
2   2011-01-02  2011-02-02

doing :

df['MBegin'] = df['MBegin'].values.astype('datetime64[D]')
df['MEnd'] = df['MEnd'].values.astype('datetime64[D]')
df['busday'] = df.apply(lambda row : np.busday_count(row['MBegin'],row['MEnd']),axis=1)

>>df

     MBegin         MEnd   busday
0   2011-01-01  2011-02-01  21
1   2011-01-10  2011-02-10  23
2   2011-01-02  2011-02-02  22
jkhadka
  • 2,443
  • 8
  • 34
  • 56
0

You can modify your code to get the desired result as below:

df = pd.DataFrame({'MonthBegin': ['2014-06-09', '2014-08-01', '2014-09-01', '2014-10-01', '2014-11-01'],
                   'MonthEnd': ['2014-06-30', '2014-08-31', '2014-09-30', '2014-10-31', '2014-11-30']})
df['MonthBegin'] = df['MonthBegin'].astype('datetime64[ns]')
df['MonthEnd'] = df['MonthEnd'].astype('datetime64[ns]')
df['BDays'] = np.busday_count(df['MonthBegin'].tolist(), df['MonthEnd'].tolist())
print(df)

  MonthBegin   MonthEnd  BDays
0 2014-06-09 2014-06-30     15
1 2014-08-01 2014-08-31     21
2 2014-09-01 2014-09-30     21
3 2014-10-01 2014-10-31     22
4 2014-11-01 2014-11-30     20

Additionally numpy.busday_count has few other optional arguments like weekmask, holidays ... which you can use according to your need.

halfer
  • 19,824
  • 17
  • 99
  • 186
nandneo
  • 495
  • 4
  • 13