Convert only the nans between the first and the last populated value of each column

Question

I have a number of dataframes that look like the example below:

each row is a period of observation each column is a quantity I am measuring (e.g. rainfall in New York)
some columns starts and end with some nans
most columns have nans in between the first and the last populated value

Desired output: for each column, convert the nans between the first and the last value to 0, without modifying the nans at the beginning and at the end

Example: convert [nan,7,6,nan,9,nan] to [nan,7,6,0,9,nan]

What I have tried:

based on Locate first and last non NaN values in a Pandas DataFrame it's easy to find the first and the last not-null value for each column I can therefore loop through each column, then for each column loop between first and last not-null values, and replace nans with zero. It works, but it is of course not vectorised, slow and inefficient.

Can you think of a better alternative?

I could, for each column, query the dataframe to find the nans between the first and last, but I would still have to loop through all the columns.

import numpy as np

np.random.seed(5)
import pandas as pd

rows = 20
df =pd.DataFrame(index = np.arange(0,rows), columns =['New York', 'London','Paris'], data = np.random.rand(rows,3))

df.iloc[0:2,0] = np.nan
df.iloc[0:3,1] = np.nan


df.iloc[-3:,0] = np.nan
df.iloc[-2:,1] = np.nan

df.iloc[7,0] = np.nan
df.iloc[10,0] = np.nan

df.iloc[9,1] = np.nan

df.iloc[11,2] = np.nan

first_notna = df.apply(pd.Series.first_valid_index)
last_notna = df.apply(pd.Series.last_valid_index)

out = df.copy()

for numcol, col in enumerate(df.columns):
    for r in np.arange( first_notna.loc[col], last_notna.loc[col] + 1 ):
        if np.isnan( df.iloc[r,numcol]):
            out.iloc[r,numcol] = 0

score 2 · Answer 1 · answered Feb 03 '21 at 15:26

2

Copying a solution from another stack overflow answer: Filling missing middle values in pandas dataframe

Uses a combination of bfill and ffill to first get a boolean df of whether the row is NOT a trailing or leading nan. It then fills those with 0

df[df.bfill().notnull() & df.ffill().notnull()] = df.fillna(0)

answered Feb 03 '21 at 15:26

MusHusKat

438
2
9

1

Also `df[df.bfill().notnull() & df.ffill().notnull() & df.isna()] = 0`. – Quang Hoang Feb 03 '21 at 15:34
@Quang Hoang - That will make all not null rows to 0 instead. – MusHusKat Feb 03 '21 at 15:43
what not null rows? There's extra `df.isna()` condition, which is equivalent to `df.fillna(0)` in your code :-) – Quang Hoang Feb 03 '21 at 15:45
Oops - sorry I didn't see that. My bad! – MusHusKat Feb 03 '21 at 16:01

Convert only the nans between the first and the last populated value of each column

1 Answers1