0

This is my DataFrame:

             Date    Bool
0      2020-09-02   False
1      2020-09-03   False
2      2020-09-04   True
3      2020-09-05   True
4      2020-09-06   False
          ...       
3034   2028-12-28   True
3035   2028-12-29   False
3036   2028-12-30   True
3037   2028-12-31   False
3038   2029-01-01   False

If between True values is 1 calendar day or 2 calendar days, I want to filter them. Only second True value is valid.
So it will be:

             Date    Bool   Bool_Filter
0      2020-09-02   False   False
1      2020-09-03   False   False
2      2020-09-04   True    False
3      2020-09-05   True    True
4      2020-09-06   False   False
          ...           
3034   2028-12-28   False   False
3035   2028-12-29   True    False
3036   2028-12-30   True    True
3037   2028-12-31   False   False
3038   2029-01-01   False   False

How to do it?

Igor K.
  • 813
  • 6
  • 17
  • 1
    Does this answer your question? [Filtering Pandas DataFrames on dates](https://stackoverflow.com/questions/22898824/filtering-pandas-dataframes-on-dates) – felipsmartins Sep 08 '20 at 18:41
  • The question is not clear to me now ~ – BENY Sep 08 '20 at 18:50
  • @BEN_YO I've edited question – Igor K. Sep 08 '20 at 18:52
  • 1
    `df['Bool_filter'] = (df['Date'].diff().between(min, max) & df['Bool']`. Edit `min` and `max` to be a timedelta of your choice. See https://docs.python.org/3/library/datetime.html – deponovo Sep 08 '20 at 19:40
  • @deponovo min and max - what variables are? integers like 0, 1 or 2 (number of days)? – Igor K. Sep 09 '20 at 07:19
  • Since the `.diff()` will produce a timedelta dtype, `min` and `max` must be `timedelta`. Eg. you want to filter out differences between 0 and 10s: `min = datetime.timedelta(0, 0)` and `max = datetime(0, 10)`. Please check the link I posted in the previous comment. – deponovo Sep 09 '20 at 07:31

1 Answers1

1

You can use the shift function to achieve this.

From your examples it's not clear what happens when there are more than 2 consecutive true values, but this matches the example solution your provided:

import pandas as pd
from pandas.core.series import Series
from datetime import date

data = [[date(2020, 9, 2), False], [date(2020, 9, 3), False],
        [date(2020, 9, 4), True], [date(2020, 9, 5), True],
        [date(2020, 9, 6), False], [date(2028, 12, 28), False],
        [date(2028, 12, 29), True], [date(2028, 12, 30), True],
        [date(2028, 12, 31), False], [date(2029, 1, 1), False]]

df = pd.DataFrame(data, columns=['Date', 'Bool'])

df['Bool_Filter'] = (
    df['Bool'] & \
    (df['Bool'].shift(1)
     & ((df['Date'].shift(1) - df['Date']) <= '2 days'))
) 
Diego Veralli
  • 1,011
  • 6
  • 13