Filling dataframe with average of previous columns values

Question

I have a dataframe with having 5 columns with having missing values. How do i fill the missing values with taking the average of previous two column values. Here is the sample code for the same.

coh0 = [0.5, 0.3, 0.1, 0.2,0.2] 
coh1 = [0.4,0.3,0.6,0.5]
coh2 = [0.2,0.2,0.3]
coh3 = [0.8,0.8]
coh4 = [0.5]

df= pd.DataFrame({'coh0': pd.Series(coh0), 'coh1': pd.Series(coh1),'coh2': pd.Series(coh2), 'coh3': pd.Series(coh3),'coh4': pd.Series(coh4)})
df

Here is the sample output

    coh0coh1coh2coh3coh4
0   0.5 0.4 0.2 0.8 0.5
1   0.3 0.3 0.2 0.8 NaN
2   0.1 0.6 0.3 NaN NaN
3   0.2 0.5 NaN NaN NaN
4   0.2 NaN NaN NaN NaN

Here is the desired result i am looking for.

The NaN value in each column should be replaced by the previous two columns average value at the same position. However for the first NaN value in second column, it will take the default last value of first column.

The sample desired output would be like below.

It would be easier to follow if you showed the expected output instead of just describing it. — fsimonjetz, Jun 10 '22 at 18:12

score 1 · Accepted Answer · answered Jun 10 '22 at 19:06

For the exception you named, the first NaN, you can do

df.iloc[1, -1] = df.iloc[0, -1]

though it doesn't make a difference in this case as the mean of .2 and .8 is .5, anyway.

Either way, the rest is something like a rolling window calculation, except it has to be computed incrementally. Normally, you want to vectorize your operations and avoid iterating over the dataframe, but IMHO this is one of the rarer cases where it's actually appropriate to loop over the columns (cf. this excellent post), i.e.,

compute the row-wise (axis=1) mean of up to two columns left of the current one (df.iloc[:, max(0, i-2):i]),
and fill its NaN values from the resulting series.

for i in range(1, df.shape[1]):
    mean_df = df.iloc[:, max(0, i-2):i].mean(axis=1)
    df.iloc[:, i] = df.iloc[:, i].fillna(mean_df)

which results in

   coh0  coh1  coh2   coh3    coh4
0   0.5   0.4  0.20  0.800  0.5000
1   0.3   0.3  0.20  0.800  0.5000
2   0.1   0.6  0.30  0.450  0.3750
3   0.2   0.5  0.35  0.425  0.3875
4   0.2   0.2  0.20  0.200  0.2000

Hi fsimonjetz, Although your code did not give me a complete solution but it gave me way and I made some additional changes in your code and it worked out.Thank you for providing a proper solution. — Debasis, Jun 14 '22 at 17:25

Filling dataframe with average of previous columns values

1 Answers1