0

I want to know how can I replace the NaN in my dataset with the last average of 5 last values.

Column A Column B
1 2
2 5
3 5
4 2
5 2
NaN 2
NaN 2
1 2
1 2
1 2
1 NaN
1 2
1 2

For example, in this case the first NaN will be the average of (1,2,3,4,5) and second NaN will be the average of (2,3,4,5, The value of the other NaN).

I have tried

df.fillna(df.mean())
  • 1
    This seems the same as for: https://stackoverflow.com/questions/49172914/how-to-fill-nan-values-with-rolling-mean-in-pandas – Glauco Nov 03 '22 at 15:26
  • @Glauco maybe, but when I use the 'rolling_mean' I have a problem AttributeError: module 'pandas' has no attribute 'rolling_mean' – Vitor Fernandes Nov 03 '22 at 15:44
  • I Tried dadosteste = dadosteste.fillna(dadosteste.rolling( 6, min_periods=1).mean()) – Vitor Fernandes Nov 03 '22 at 15:49
  • it seems rolling_mean was deprecated, here the new recipe: https://stackoverflow.com/questions/36274447/pd-rolling-mean-becoming-deprecated-alternatives-for-ndarrays – Glauco Nov 03 '22 at 16:07

1 Answers1

0

As mentioned, it has been answered here, but the updated version for the latest pandas version is as follow:


data={'col1':[1,2,3,4,5,np.nan,np.nan,1,1,1,1,1,1],
      'col2':[2,5,5,2,2,2,2,2,2,2,np.nan,2,2]}
df=pd.DataFrame(data)

window_size = 5
df=df.fillna(df.rolling(window_size+1, min_periods=1).mean())

outputs:

    col1  col2
0    1.0   2.0
1    2.0   5.0
2    3.0   5.0
3    4.0   2.0
4    5.0   2.0
5    3.0   2.0
6    3.5   2.0
7    1.0   2.0
8    1.0   2.0
9    1.0   2.0
10   1.0   2.0
11   1.0   2.0
12   1.0   2.0
Mat.B
  • 336
  • 2
  • 8
  • If the np.nan is on the first row, how can I do? – Vitor Fernandes Nov 03 '22 at 17:06
  • Well i'm not sure... what would be the strategy in the case? Take the mean of the 5 next rows ? If the dataset is "large", I would handle the first rows "by hand" i guess, i cannot see any general method that would take care of all the cases – Mat.B Nov 03 '22 at 17:32