0

I want to fill in the missing values of two columns with the mean method. I type of the two columns is float64.

df['col1'].dtypes
dtype('float64')
df['col2'].dtypes
dtype('float64')

I used two methods to fill the columns. 1st I fill the nan values with '0'.

df.replace(np.nan,0, inplace=True )

Then I used fillna.mean() method to fill the columns

 df['col1']=df['col1'].fillna(df['col1'].mean(), inplace=True)

This is return something like that

Col1
Nan
Nan
Nan

I tried second method without first filling the nan values with zero and directly applied mean imputation method which return "None".

I did not understand what was wrong with my implementation. Any help would be appreciated.

Encipher
  • 1,370
  • 1
  • 14
  • 31
  • 1
    Please provide a [reproducible snippet](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) of your dataframe. – BigBen Jul 17 '23 at 16:04

2 Answers2

0

A possible solution (you need to use skipna=True when calculating the mean):

df['col1'].fillna(df['col1'].mean(skipna=True), inplace=True)
PaulS
  • 21,159
  • 2
  • 9
  • 26
  • I did that. I tried it without filling the "Nan" value with zero. However, it did not change the output. It still 'None' – Encipher Jul 17 '23 at 15:45
  • It is hard to diagnose the problem without a piece of your data, @Encipher. For instance, the following works fine: `df = pd.DataFrame({ 'col1': [1, np.nan, 3], 'col2': ['a', 'b', 'c'], 'col3': [1, 2, 3]}) df['col1'].fillna(df['col1'].mean(skipna=True), inplace=True)` – PaulS Jul 17 '23 at 15:46
  • Another thing I noticed when I first found out the datatype without replacement the datatype was "dtype('float64')". After imputation and getting None vale the datatype is dtype('O') – Encipher Jul 17 '23 at 15:51
  • I don't know what's the problem. I made a trick. I separated only that column from the dataframe and use mean imputation and that worked well. I don't know what should be its possible explanation. "data = df.col1", data.fillna(data.mean(), inplace = True) – Encipher Jul 17 '23 at 15:56
  • 1
    You might also try: `df['col1'].replace(np.nan, df['col1'].mean())`. – PaulS Jul 17 '23 at 15:59
  • Thank you. Its work. Can you please explain what's the difference between this command with previous commands? – Encipher Jul 17 '23 at 16:45
  • 1
    Without seeing the data, it is hard to find an explanation, @Encipher. – PaulS Jul 17 '23 at 16:57
0

Quoting the question: "Then I used fillna.mean() method to fill the columns"

df['col1']=df['col1'].fillna(df['col1'].mean(), inplace=True)

Remove the inplace argument since you assign the column. This is the first mistake.

After clearing this typo, this works perfectly despite years past: pandas DataFrame: replace nan values with average of columns

df.fillna(df.mean())

Verified with a fresh random example:

   col1  col2
0   1.0   NaN
1   0.1   1.0
2   NaN   3.2
3   4.0   NaN
4   8.0   0.0
df.fillna(df.mean())
    col1  col2
0  1.000   1.4
1  0.100   1.0
2  3.275   3.2
3  4.000   1.4
4  8.000   0.0

There is no need to replace NaN with 0 in the first place (nor to skipna when calculating the means, as another answer suggested).

OCa
  • 298
  • 2
  • 13