1

I have dataframe of datetime index. I have a three lists of dates prescribing their condition. I want to compare each date of dataframe with three lists and assigns a string to the row.

df = 
  index                   data
2019-02-04 14:52:00    73.923746
2019-02-05 10:48:00    73.335315
2019-02-05 11:28:00    72.021457
2019-02-06 10:49:00    72.367468
2019-02-07 10:16:00    73.434296
2019-02-14 10:54:00    73.094386
2019-02-27 12:08:00    70.930997
2019-02-28 12:41:00    70.444107
2019-02-28 13:21:00    70.426729
2019-03-29 11:29:00    70.758032
2019-04-29 11:29:00    70.758032
2019-12-14 14:30:00    73.515568
2019-12-23 10:54:00    72.812583

bad_dates = [dates_bwn_twodates('2019-03-22','2019-04-09'),'bad_day']
good_dates= [dates_bwn_twodates('2019-4-10','2019-4-29'),'good_day']

explist = [bad_dates,good_dates]

I want to compare each index in df with the above two lists and produce a new column indicating the condition of the day. My present code

df['test'] =  'normal_day'
for i in explist:
    for j in df.index:
        if bool(set(i[0])&set(j.strftime('%Y-%m-%d'))) == True:
            df['test'].loc[j] = i[1]

My present output is

  index                   data       test 
2019-02-04 14:52:00    73.923746     normal_day 
2019-02-05 10:48:00    73.335315     normal_day 
2019-02-05 11:28:00    72.021457     normal_day 
2019-02-06 10:49:00    72.367468     normal_day 
2019-02-07 10:16:00    73.434296     normal_day 
2019-02-14 10:54:00    73.094386     normal_day 
2019-02-27 12:08:00    70.930997     normal_day 
2019-02-28 12:41:00    70.444107     normal_day 
2019-02-28 13:21:00    70.426729     normal_day 
2019-03-29 11:29:00    70.758032     normal_day 
2019-04-29 11:29:00    70.758032     normal_day 
2019-12-14 14:30:00    73.515568     normal_day 
2019-12-23 10:54:00    72.812583     normal_day 

My code is not working properly.

Mainland
  • 4,110
  • 3
  • 25
  • 56
  • What does _my code is not working properly_ mean, exactly? Why would you use loops for this? Why the `if ... == True:`? Have you not read the pandas docs? – AMC Jan 23 '20 at 21:31
  • 1
    Does this answer your question? [Pandas conditional creation of a series/dataframe column](https://stackoverflow.com/questions/19913659/pandas-conditional-creation-of-a-series-dataframe-column) – AMC Jan 23 '20 at 21:32

1 Answers1

2

Create the masks

bad = df['index'].between('2019-03-22', '2019-04-09')
good = df['index'].between('2019-04-10', '2019-04-29')

Then assign them

df['test'] =  'normal_day'
df.loc[bad, 'test'] = 'bad_day'
df.loc[good, 'test'] = 'good_day'
Kenan
  • 13,156
  • 8
  • 43
  • 50
  • 1
    Your solution is so simple and elegant. I got some error: `AttributeError: 'DatetimeIndex' object has no attribute 'between'` – Mainland Jan 23 '20 at 20:58
  • 1
    I found this apporach `mask = (df['date'] > start_date) & (df['date'] <= end_date)` Thanks. – Mainland Jan 23 '20 at 21:07
  • 1
    You could also convet to `str` to use between, `df['index'].astype(str).between(...)` or [between_time](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.between_time.html) – Kenan Jan 23 '20 at 21:08
  • 1
    I am trying to use `between_time`. It looks good than masking. But I am getting errors for `df.between_time(pd.to_datetime('2019-04-30'),pd.to_datetime('2019-05-09'))` as `ValueError: Cannot convert arg [Timestamp('2019-04-30 00:00:00')] to a time` – Mainland Jan 23 '20 at 21:18
  • 1
    if your `df['index'].dtype` is `datetime64`, `between` should work fine, `df['index'].between('2019-02-05', '2019-04-28')` – Kenan Jan 23 '20 at 21:31