1

let's say you have this data frame:

 df = pd.DataFrame( data =    [ '2014-04-07 10:55:35.087000+00:00',
                               '2014-04-07 13:59:37.251500+00:00',
                               '2014-04-02 13:23:59.629000+00:00',
                               '2014-04-07 12:17:48.182000+00:00',
                               '2014-04-06 17:00:23.912000+00:00'],
                    columns = ['timestamp'],
                    dtype = np.datetime64
                    )

and you want to create a new column where the values are 1 if the timestamp is a weekday or 0 if it is not. Then I would run something like this:

 df['weekday'] = df['timestamp'].apply(lambda x: 1 if x.weekday() < 5 else 0 )

So far so good. However, in my case I have about 10 million rows of such timestamp values and it just takes forever to run. So, I looked around for vectorization options and I found numpy.where(). But, of course, this does not work: np.where(df['timestamp'].weekday() < 5, 1, 0)

So, is there a way to access the .weekday() method of the timestamps when using numpy.where or is there any other way to produce the weekday column when having 10 million rows? Thanks.

ansev
  • 30,322
  • 5
  • 17
  • 31
SteliosM
  • 43
  • 4

1 Answers1

2

Use Series.dt.dayofweek / Series.dt.weekday with Series.lt and Series.astype:

df['weekday'] = df['timestamp'].dt.dayofweek.lt(5).astype(int)
print(df)
                   timestamp  weekday
0 2014-04-07 10:55:35.087000        1
1 2014-04-07 13:59:37.251500        1
2 2014-04-02 13:23:59.629000        1
3 2014-04-07 12:17:48.182000        1
4 2014-04-06 17:00:23.912000        0

I recommend you see: when should I ever want to use apply in my code

We could also use np.where:

df['weekday'] = np.where(df['timestamp'].dt.dayofweek.lt(5), 1, 0)
ansev
  • 30,322
  • 5
  • 17
  • 31
  • Fantastic. Many thanks. I was not aware of those .dt and .lt methods of Series. I will read more about them and definitely start using them more. – SteliosM Mar 29 '20 at 13:52