let's say you have this data frame:
df = pd.DataFrame( data = [ '2014-04-07 10:55:35.087000+00:00',
'2014-04-07 13:59:37.251500+00:00',
'2014-04-02 13:23:59.629000+00:00',
'2014-04-07 12:17:48.182000+00:00',
'2014-04-06 17:00:23.912000+00:00'],
columns = ['timestamp'],
dtype = np.datetime64
)
and you want to create a new column where the values are 1 if the timestamp is a weekday or 0 if it is not. Then I would run something like this:
df['weekday'] = df['timestamp'].apply(lambda x: 1 if x.weekday() < 5 else 0 )
So far so good. However, in my case I have about 10 million rows of such timestamp values and it just takes forever to run. So, I looked around for vectorization options and I found numpy.where()
. But, of course, this does not work: np.where(df['timestamp'].weekday() < 5, 1, 0)
So, is there a way to access the .weekday() method of the timestamps when using numpy.where or is there any other way to produce the weekday column when having 10 million rows? Thanks.