0

I have a pandas data frame that has the time column from 00:00:00 to 23:00:00 with type timedelta64[ns]. I want to get create a new column df['m_hours'] with 1 and 0 based on time range. For example, when the time range is 01:00:00 to 04:00:00, it should be 1 and the rest should be 0. I tried the following code,

df['m_hours'] = np.where(df['hour']>= '01:00:00'& df['hour']<= '04:00:00', '1', '0')

I get an error saying, TypeError: cannot compare a dtyped [timedelta64[ns]] array with a scalar of type [bool] and then I tried,

df['m_hours'] = np.where(df[(df['hour']>= '01:00:00')& (df['hour']<= '04:00:00'), '1', '0']

an then I got an error, SyntaxError: unexpected EOF while parsing

This post looks promising, but does not help much in my situation. Are there any other ways to create dummies for a certain range of time? I will appreciate any help.

Thank you!

As requested, the following edit is a sample df and data types

    date   hour                   avg_price  
2016-05-01 00:00:00                  69.5                  
2016-05-01 01:00:00                  67.0                  
2016-05-01 02:00:00                  66.0                  
2016-05-01 03:00:00                  66.0                
2016-05-01 04:00:00                  65.0                  
2016-05-01 05:00:00                  65.0                  
2016-05-01 06:00:00                  65.5                 
2016-05-01 07:00:00                  69.0                
2016-05-01 08:00:00                  72.0                  
2016-05-01 09:00:00                  77.0                 
2016-05-01 10:00:00                  80.0                  
2016-05-01 11:00:00                  81.0                 
2016-05-01 12:00:00                  82.0                  
2016-05-01 13:00:00                  85.0                  
2016-05-01 14:00:00                  85.0                  
2016-05-01 15:00:00                  85.0                  
2016-05-01 16:00:00                  88.0                  
2016-05-01 17:00:00                  87.0                  
2016-05-01 18:00:00                  86.0                  
2016-05-01 19:00:00                  81.0                  
2016-05-01 20:00:00                  79.0                  
2016-05-01 21:00:00                  78.0                  
2016-05-01 22:00:00                  76.0                  
2016-05-01 23:00:00                  74.0                  
2016-05-02 00:00:00                  73.0                   
2016-05-02 01:00:00                  68.0                  
2016-05-02 02:00:00                  66.0                   
2016-05-02 03:00:00                  66.0                   
2016-05-02 04:00:00                  64.0                  
2016-05-02 05:00:00                  67.0                  

and the data types are:

date                    datetime64[ns]
hour                    timedelta64[ns]
avg_price               float64
i.n.n.m
  • 2,936
  • 7
  • 27
  • 51
  • 1
    Does `df['m_hours'] = ((df['hour'] >= '01:00:00') & (df['hour'] <= '04:00:00')).astype(int)` work for you? – lanery Jul 05 '17 at 18:36
  • @lanery, I tried that, it did not work, I get a suggestion to use `Try using .loc[row_indexer,col_indexer] = value instead`, then i converted to `pd.to_timedelta` still getting the errors mentioned in the question. – i.n.n.m Jul 05 '17 at 18:40
  • can you give a sample df. – shivsn Jul 05 '17 at 18:40
  • I think it would be more helpful if you provided a sample of your dataframe (just the first 5-10 rows) and added that to your question. I think the issue is probably that your `df['hour']` column is `timedelta64[ns]` as opposed to `datetime64[ns]` – lanery Jul 05 '17 at 18:47
  • @lanery a sample df is added in the question. – i.n.n.m Jul 05 '17 at 19:00
  • @inja, Sorry but I cannot reproduce your issue when I load your data and set those dtypes – lanery Jul 05 '17 at 20:23
  • @lanery, I found the bug and posted an answer in case if anyone else come across this issue in the future. This works fine for me now. Thank you for your time :) – i.n.n.m Jul 05 '17 at 20:28

1 Answers1

0

I thought of posting an answer to my own question after doing a little more research and hope this might help other if anyone come across this issue. I tried the following code without changing the type of my object,

df['m_hour'] = np.where((df['hour'] >= '01:00:00') & (df['hour'] <= '04:00:00'),'1','0')

I was missing a parenthesis after np.where clause.

i.n.n.m
  • 2,936
  • 7
  • 27
  • 51