I need to create a python code to search "N" as variable, consecutive rows in a column dataframe with the same value and different that NaN like this. I can't figure out how to do it with a for loop because I don't know which row I'm looking at in each case. Any idea that how can do it?
Fruit | 2 matches | 5 matches |
---|---|---|
Apple | No | No |
NaN | No | No |
Pear | No | No |
Pear | Yes | No |
Pear | Yes | No |
Pear | Yes | No |
Pear | Yes | Yes |
NaN | No | No |
NaN | No | No |
NaN | No | No |
NaN | No | No |
NaN | No | No |
Banana | No | No |
Banana | Yes | No |
Update: testing solutions by @Corralien
counts = (df.groupby(df['Fruit'].ne(df['Fruit'].shift()).cumsum()) # virtual groups
.transform('cumcount').add(1) # cumulative counter
.where(df['Fruit'].notna(), other=0)) # set NaN to 0
N = 2
df['Matches'] = df.where(counts >= N, other='No')
VSCode return me the 'Frame skipped from debugging during step-in.' message when execute the last line and generate an exception in the previous for loop.