0

I am using this data set: Titanic pasengers I am trying to fill in missing categorical data but the fillna() with the inplace option does not do anything:

import pandas as pd

data = pd.read_csv('https://www.openml.org/data/get_csv/16826755/phpMYEkMl')

# replace question marks with np.nan
data = data.replace('?', np.nan)

var_categor = ['sex', 'cabin', 'embarked' ] 

data.loc[:, var_categor].fillna("Missing", inplace=True)

I get the same number of nan values:

data[var_categor].isnull().sum()

I get no error messages, no warnings, it just doesnt do anything. Is this normal behavior? Shouldn't it give a warning?

KZiovas
  • 3,491
  • 3
  • 26
  • 47
  • https://stackoverflow.com/questions/55744015/iloc-fillna-inplace-true-vs-inplace-false – It_is_Chris Oct 01 '21 at 19:52
  • 1
    Yes this is normal behaviour. `data.loc[:, var_categor]` creates a copy `inplace` affects only that copy. Since there are no references to the object it is no longer accessible. It does not give a warning because almost all levels of pandas [discourage the use of inplace](https://stackoverflow.com/a/60020384/15497888), and eventually it will be removed from all methods which is why development on implementing warnings are not implemented the same way chained expressions are with [SettingWithCopyWarning](https://stackoverflow.com/q/20625582/15497888). – Henry Ecker Oct 01 '21 at 19:59

2 Answers2

1

Try to chain operations and return a copy of values rather than modify inplace:

data[var_categor] = data.replace('?', np.nan)[var_categor].fillna('Missing')
>>> data[var_categor].isna().sum()
sex         0
cabin       0
embarked    0
dtype: int64
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • Hello. Yes I know I can do that but why is it no working? Is it normal behavior it just does't? – KZiovas Oct 01 '21 at 20:07
  • 2
    No it's not a normal behavior because the option exists but does not work as expected. `inplace` parameter should be remove (I hope). You can read [this issue](https://github.com/pandas-dev/pandas/issues/16529) and [this question](https://stackoverflow.com/q/45570984/15239951) to know more. – Corralien Oct 01 '21 at 20:18
1

It’s likely an issue with getting a view/slice/copy of the dataframe, and setting things in-place on that object.

The trivial fix is to not use inplace of course:

data[var_categor] = data[var_categor].fillna("Missing")

An alternate way is to use .fillna directly on the object. Here if you want to limit which columns are filled, a dictionary mapping columns to replacement values can be used:

>>> data.fillna({var: 'Missing' for var in var_categor}, inplace=True)
>>> data[var_categor].isna().sum()
sex         0
cabin       0
embarked    0
dtype: int64

However best practice in pandas is to avoid inplace, see the github issue that discusses deprecating it for more detail.

Cimbali
  • 11,012
  • 1
  • 39
  • 68