Fast way of calculating number of consecutive nan values in a column

Question

I want to transform my dataframe so that the new DataFrame is of the same shape where each entry represents the number of consecutive NaNs counted after its position as follows:

IN:

    A       B      
0   0.1880  0.345 
1   0.2510  0.585  
2   NaN     NaN  
3   NaN     NaN 
4   NaN     1.150  
5   0.2300  1.210  
6   0.1670  1.290  
7   0.0835  1.400  
8   0.0418  NaN    
9   0.0209  NaN    
10  NaN     NaN    
11  NaN     NaN    
12  NaN     NaN

OUT:

    A       B      
0   0       0    
1   0       0  
2   3       2  
3   2       1 
4   1       0  
5   0       0 
6   0       0 
7   0       0 
8   0       5    
9   0       4   
10  3       3   
11  2       2 
12  1       1

Similar question that I was trying to modify - Fast way to get the number of NaNs in a column counted from the last valid value in a DataFrame

Does this answer your question? [Perform a reverse cumulative sum on a numpy array](https://stackoverflow.com/questions/16541618/perform-a-reverse-cumulative-sum-on-a-numpy-array) — Yevhen Kuzmovych, Aug 10 '23 at 14:26

score 2 · Accepted Answer · answered Aug 10 '23 at 14:35

Inspired from this answer https://stackoverflow.com/a/52718619/3275464

from io import StringIO
import pandas as pd

s = """    A       B      
0   0.1880  0.345 
1   0.2510  0.585  
2   NaN     NaN  
3   NaN     NaN 
4   NaN     1.150  
5   0.2300  1.210  
6   0.1670  1.290  
7   0.0835  1.400  
8   0.0418  NaN    
9   0.0209  NaN    
10  NaN     NaN    
11  NaN     NaN    
12  NaN     NaN    """

df = pd.read_csv(StringIO(s), engine='python', sep='\s+')

_df = df.isna().iloc[::-1]
b = _df.cumsum()
c = b.sub(b.mask(_df).ffill().fillna(0)).astype(int).iloc[::-1]
c #gives the output you seem to want

score 1 · Answer 2 · answered Aug 10 '23 at 14:33

You can try this way to transform your DataFrame by counting the consecutive NaN values after each position in each column and replacing the NaN values with the count.

import pandas as pd
import numpy as np

# Create the input DataFrame
data = {
    'A': [0.1880, 0.2510, np.nan, np.nan, np.nan, 0.2300, 0.1670, 0.0835, 0.0418, 0.0209, np.nan, np.nan, np.nan],
    'B': [0.345, 0.585, np.nan, np.nan, 1.150, 1.210, 1.290, 1.400, np.nan, np.nan, np.nan, np.nan, np.nan]
}

df = pd.DataFrame(data)

# Initialize counters
counters = {col: 0 for col in df.columns}

# Transform the DataFrame
for col in df.columns:
    for i in range(len(df)):
        if pd.isna(df.at[i, col]):
            counters[col] += 1
            df.at[i, col] = counters[col]
        else:
            counters[col] = 0

print(df)

score 1 · Answer 3 · answered Aug 10 '23 at 14:40

One option:

tmp = df.notna()

out = tmp.apply(lambda s: s[::-1].groupby(s.ne(s.shift()).cumsum()).cumcount().add(1)
               ).mask(tmp, 0)[::-1]

Output:

Fast way of calculating number of consecutive nan values in a column

3 Answers3