1

I want to modify only numeric variables in my data frame, i.e. impute missing values of numeric variables by median and those of factor variables by mode. To modify only numeric variables, I tried following:

xTrain.select_dtypes(include=numerics) =  xTrain.select_dtypes(include=numerics).fillna(xTrain.mean(), inplace=True)

but it says:

SyntaxError: can't assign to function call

In fact, this solution just worked but I am not happy with it as it doesn't involve an assignment operation ('='). Moreover, this is a "private method" (i.e., an implementation detail) and is subject to change or total removal in the future. Was recommended to use with caution by answer here :

xTrain._get_numeric_data().fillna(xTrain.mean(), inplace=True)

Was thinking if there are alternative ways to select just numeric columns and impute them in the whole data, meaning modifying only part of the dataframe? Thanks in advance!

Bharat Ram Ammu
  • 174
  • 2
  • 16

1 Answers1

3

You can get all columns with DataFrame.select_dtypes, so assign working nice:

xTrain = pd.DataFrame({'address':['a', 'b', 'c'],'b':[1,2, np.nan]})
print (xTrain)
  address    b
0       a  1.0
1       b  2.0
2       c  NaN

cols = xTrain.select_dtypes(include=np.number).columns

xTrain[cols] = xTrain[cols].fillna(xTrain.mean())
print (xTrain)
  address    b
0       a  1.0
1       b  2.0
2       c  1.5
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    Excellent syntax to document. Idea to use cols as a list and then modify using a list is exactly what I want to make it look comprehensible. Thanks! – Bharat Ram Ammu Jun 25 '19 at 13:16