Identifying consecutive NaN's with pandas part 2

Question

I have a question related to the earlier question: Identifying consecutive NaN's with pandas

I am new on stackoverflow so I cannot add a comment, but I would like to know how I can partly keep the original index of the dataframe when counting the number of consecutive nans.

So instead of:

df = pd.DataFrame({'a':[1,2,np.NaN, np.NaN, np.NaN, 6,7,8,9,10,np.NaN,np.NaN,13,14]})
df
Out[38]:
     a
0    1
1    2
2  NaN
3  NaN
4  NaN
5    6
6    7
7    8
8    9
9   10
10 NaN
11 NaN
12  13
13  14

I would like to obtain the following:

You should change your question title to `How to keep original indexes after grouping`. Also, look at this [question](https://stackoverflow.com/questions/49216357/how-to-keep-original-index-of-a-dataframe-after-groupby-2-columns) — Jonas Palačionis, Jan 21 '21 at 12:20
Does this answer your question? [How to keep original index of a DataFrame after groupby 2 columns?](https://stackoverflow.com/questions/49216357/how-to-keep-original-index-of-a-dataframe-after-groupby-2-columns) — Jonas Palačionis, Jan 21 '21 at 12:21

Mathias711 · Accepted Answer · 2021-01-21T14:40:22.390

I have found a workaround. It is quite ugly, but it does the trick. I hope you don't have massive data, because it might be not very performing:

df = pd.DataFrame({'a':[1,2,np.NaN, np.NaN, np.NaN, 6,7,8,9,10,np.NaN,np.NaN,13,14]})
df1 = df.a.isnull().astype(int).groupby(df.a.notnull().astype(int).cumsum()).sum()

# Determine the different groups of NaNs. We only want to keep the 1st. The 0's are non-NaN values, the 1's are the first in a group of NaNs. 
b = df.isna()
df2 = b.cumsum() - b.cumsum().where(~b).ffill().fillna(0).astype(int)
df2 = df2.loc[df2['a'] <= 1]

# Set index from the non-zero 'NaN-count' to the index of the first NaN
df3 = df1.loc[df1 != 0]
df3.index = df2.loc[df2['a'] == 1].index

# Update the values from df3 (which has the right values, and the right index), to df2 
df2.update(df3)

The NaN-group thingy is inspired by the following answer: This is coming from the this answer.

Wow, I'm so surprised this is easier to do than this. But it appears not to be using pandas, make with numpy it could be. +1 — Scott Boston, Jan 21 '21 at 15:36

Identifying consecutive NaN's with pandas part 2

1 Answers1