1

I have a question related to the earlier question: Identifying consecutive NaN's with pandas

I am new on stackoverflow so I cannot add a comment, but I would like to know how I can partly keep the original index of the dataframe when counting the number of consecutive nans.

So instead of:

df = pd.DataFrame({'a':[1,2,np.NaN, np.NaN, np.NaN, 6,7,8,9,10,np.NaN,np.NaN,13,14]})
df
Out[38]:
     a
0    1
1    2
2  NaN
3  NaN
4  NaN
5    6
6    7
7    8
8    9
9   10
10 NaN
11 NaN
12  13
13  14

I would like to obtain the following:


Out[41]:
     a
0    0
1    0
2    3
5    0
6    0
7    0
8    0
9    0
10   2
12   0
13   0
Thomsma
  • 35
  • 4
  • 1
    You should change your question title to `How to keep original indexes after grouping`. Also, look at this [question](https://stackoverflow.com/questions/49216357/how-to-keep-original-index-of-a-dataframe-after-groupby-2-columns) – Jonas Palačionis Jan 21 '21 at 12:20
  • 1
    Does this answer your question? [How to keep original index of a DataFrame after groupby 2 columns?](https://stackoverflow.com/questions/49216357/how-to-keep-original-index-of-a-dataframe-after-groupby-2-columns) – Jonas Palačionis Jan 21 '21 at 12:21
  • 1
    @ScottBoston I will edit the question to clarify this. – Thomsma Jan 21 '21 at 15:03

1 Answers1

2

I have found a workaround. It is quite ugly, but it does the trick. I hope you don't have massive data, because it might be not very performing:

df = pd.DataFrame({'a':[1,2,np.NaN, np.NaN, np.NaN, 6,7,8,9,10,np.NaN,np.NaN,13,14]})
df1 = df.a.isnull().astype(int).groupby(df.a.notnull().astype(int).cumsum()).sum()

# Determine the different groups of NaNs. We only want to keep the 1st. The 0's are non-NaN values, the 1's are the first in a group of NaNs. 
b = df.isna()
df2 = b.cumsum() - b.cumsum().where(~b).ffill().fillna(0).astype(int)
df2 = df2.loc[df2['a'] <= 1]

# Set index from the non-zero 'NaN-count' to the index of the first NaN
df3 = df1.loc[df1 != 0]
df3.index = df2.loc[df2['a'] == 1].index

# Update the values from df3 (which has the right values, and the right index), to df2 
df2.update(df3)

The NaN-group thingy is inspired by the following answer: This is coming from the this answer.

Mathias711
  • 6,568
  • 4
  • 41
  • 58
  • Wow, I'm so surprised this is easier to do than this. But it appears not to be using pandas, make with numpy it could be. +1 – Scott Boston Jan 21 '21 at 15:36