0

Suppose I have daily data from 2010 until 2020:

Ex:

Date            col1

2010-01-01      False
2010-01-02      False
...

2020-12-31      False

I want to set col1 = True for all rows, where (month is equal 4 and day is bigger than 25) and month is equal to 5 and day is less then 5. So basically it means, that for all dates between 25th day of the 4th month and 5th day of the 5th month I want to set col1 = True.

How can I do it?

Daniel Yefimov
  • 860
  • 1
  • 10
  • 24
  • https://stackoverflow.com/questions/56688364/efficiently-check-if-dataframe-has-date-between-a-range-and-return-a-count – tjaqu787 Sep 21 '21 at 18:18

2 Answers2

2

You can just use .dt.month and .dt.day to access month and day from the date and then create the conditions from that:

df.loc[
  ((df.Date.dt.month == 4) & (df.Date.dt.day > 25)) | 
  ((df.Date.dt.month == 5) & (df.Date.dt.day < 5)), 
'col1'] = True

Assume your Date is already datetime type, if not, you can use following to convert:

df.Date = pd.to_datetime(df.Date)
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • +10, i think this does work for the given bounds, but would probably get unwieldy with other ranges like 04-25 to 08-05 (since the inner months would have no day restrictions?) – tdy Sep 21 '21 at 19:08
  • 1
    i guess 04-25 to 08-05 would be something like `((df.Date.dt.month == 4) & (df.Date.dt.day > 25)) | (df.Date.dt.month > 4) | (df.Date.dt.month < 8) | ((df.Date.dt.month == 8) & (df.Date.dt.day < 5))` – tdy Sep 21 '21 at 19:11
  • 1
    @tdy Yeah, in that case we can add another or condition to say `df.Date.dt.month.between(5, 7)`. Might not be the most concise though. – Psidom Sep 21 '21 at 19:12
1

You can create a boolean index and assign it to a new column in your dataframe.

To find rows where the value of Date is between two other values, you can use the between method. between can take an argument inclusive that can be any of "both", "neither", "left", or "right". Use this to dial in the exact interval you want.

To ignore the year and only compare on month and day, if the column is of the dtype datetime64, you can use the dt accessor and use strftime to get just the month and day as a string and compare using lexicographic ordering:

col1 = df['Date'].dt.strftime('%m-%d').between(
    '04-25', 
    '05-05', 
    inclusive="neither"
)

To add the boolean index to your dataframe:

df['col1'] = col1

To only set the values in 'col1' for the matching rows, per @tdy in the comments:

df.loc[col1, 'col1'] = True
Angus L'Herrou
  • 429
  • 3
  • 11