Getting maximum consecutive sum of nulls per each group

Question

I have a Dataframe df which looks like this:

col1      col2      col3      col4

 A         A     2017-07-18     2
 A         A     2017-07-25    NaN
 A         A     2017-08-01    NaN
 A         A     2017-08-08    NaN
 A         B     2017-07-18    NaN
 A         B     2017-07-25    2.75
 A         B     2017-08-01    NaN
 A         B     2017-08-08    NaN
 B         B     2017-07-18    3.25
 B         B     2017-07-25    3.34
 B         B     2017-08-01    3.58
 B         B     2017-08-08    3.25
 B         D     2017-07-18    1.28
 B         D     2017-07-25    2.34
 B         D     2017-08-01    NaN
 B         D     2017-08-08    2.65
 B         D     2017-08-15    2.63

Now from this, I want to create a new dataframe df2 which will show me the col1 and col2 as well as a new column which will count the max number of consecutive NaN in df.col4. My desired result will be like that:

col1     col2      new_col

  A        A          3
  A        B          2
  B        B          0
  B        D          1

The thing I tried to do is to use shift() and some boolean masking to get the rows for which I have more or equal than 2 consecutive nulls, but the result dataframe that I got is nowhere near what I'm trying to achieve. Any thoughts?

score 0 · Accepted Answer · answered Aug 23 '17 at 11:22

A bit convoluted way of doing it, since you can't do NaN equality check in shift, we'll create a temporary isnull

In [827]: (df.assign(null=df.col4.isnull()).groupby(['col1', 'col2'])
             .null
             .apply(lambda x: (
                    x * (x.groupby((x != x.shift()).cumsum()).cumcount() + 1)).max())
             .reset_index())
Out[827]:
  col1 col2  null
0    A    A     3
1    A    B     2
2    B    B     0
3    B    D     1

Refer to https://stackoverflow.com/a/27626699 on how consecutive groups are found.

score -1 · Answer 2 · answered Aug 23 '17 at 10:52

-1

I would suggest this algorithm

1)make a dictionary: {(col1,col2):numberOfConsecutiveNANs}

2)read line by line and count the consecutive NANs

3) compare to the number in the dictionary, if greater -> ovvewrite the number

answered Aug 23 '17 at 10:52

Superluminal

947
10
23

Getting maximum consecutive sum of nulls per each group

2 Answers2