I have a Dataframe df which looks like this:
col1 col2 col3 col4
A A 2017-07-18 2
A A 2017-07-25 NaN
A A 2017-08-01 NaN
A A 2017-08-08 NaN
A B 2017-07-18 NaN
A B 2017-07-25 2.75
A B 2017-08-01 NaN
A B 2017-08-08 NaN
B B 2017-07-18 3.25
B B 2017-07-25 3.34
B B 2017-08-01 3.58
B B 2017-08-08 3.25
B D 2017-07-18 1.28
B D 2017-07-25 2.34
B D 2017-08-01 NaN
B D 2017-08-08 2.65
B D 2017-08-15 2.63
Now from this, I want to create a new dataframe df2 which will show me the col1 and col2 as well as a new column which will count the max number of consecutive NaN in df.col4. My desired result will be like that:
col1 col2 new_col
A A 3
A B 2
B B 0
B D 1
The thing I tried to do is to use shift() and some boolean masking to get the rows for which I have more or equal than 2 consecutive nulls, but the result dataframe that I got is nowhere near what I'm trying to achieve. Any thoughts?