5

After filling in nulls in all of my category columns using the solutions from here and here, I was left with many nulls in many of my float columns. I thought a simple df.fillna(0.0, inplace = True) would work, but, I get the error ValueError: fill value must be in categories. I thought this error was only for category type columns.

So,

I have many float columns and many category columns. I filled category columns by adding category "unknown" and then filling nulls by "unknown". Now, a simple

    df.fillna(0.0, inplace = True)  

should have worked. But, it does not.

A simple way to reproduce this problem is the following:

     df = pd.DataFrame({"A": ["a"], "B":[np.nan] })
     df['A'] = df['A'].astype('category')
     df.fillna(0.0, inplace = True)

Please don't say that I can do:

     df['A'].fillna(0.0, inplace = True)

I have many float columns, and I can't go one by one. I have to fill all the nulls in remaining columns by 0.0 in bulk. Rest assured, all the columns are float type, but, there could be additional category columns, but, they don't have any nulls.

Appreciate any solutions.

learner
  • 857
  • 1
  • 14
  • 28

2 Answers2

2

The main issue here is that pandas doesn't allow us to replace NaN values in a category with values not in the category levels. For example, if you try df.fillna('a') it will work, since 'a' is present in the category levels. Interestingly, even without NaN in the category columns, pandas raises a ValueError (a bug maybe?). So you'll have to specify your target columns or your target dtype columns in order to fill NaN.

That said, you can easily replace NaN within as many float columns as you have with:

df.fillna({col: 0.0 for col in df.columns[df.dtypes.eq(float)]})

or

df.loc[:, df.dtypes.eq(float)] = df.select_dtypes(float).fillna(0.0)

Alternatively, you can just fill NaN in all columns other than category:

df.loc[:, df.dtypes.ne('category')] = df.select_dtypes(exclude='category').fillna(0.0)

Update:

Apparently, there's an open issue already raised about it. Take a look at: https://github.com/pandas-dev/pandas/issues/24079.

Cainã Max Couto-Silva
  • 4,839
  • 1
  • 11
  • 35
1

You need to only apply fillna() on the slice that contains float columns only, then reassign. Otherwise pandas will think you might want to fillna a nonexisting value 0.0 into the categories columns, even if they do not have anything to fill.

This modification to your example will work:

df = pd.DataFrame({"A": ["a"], "B":[np.nan] })
df['A'] = df['A'].astype('category')
#df.fillna("a", inplace = True)
df.loc[:,[c for c in df.columns if df[c].dtype=="float64"]] = df.loc[:,[c for c in df.columns if df[c].dtype=="float64"]].fillna(0.0)
df
Gena Kukartsev
  • 1,515
  • 2
  • 17
  • 19