1

I am working on Melbourne housing dataset and during the pre processing I'm trying to impute missing data using the Mean / median strategy. I tried using Imputer from Sklearn.preprocessing.

imp = Imputer( strategy='mean' )
dataset = imp.fit(dataset)

Upon running this I encountered this error.

ValueError: could not convert string to float: 'Western Metropolitan'

I am aware that the imputing takes place only in float values but I need to do either of the 2:

1) Impute only values other than string in the dataset

2) Impute data with string

I could not find any kind of solution online. Thanks in advance.

Umang Mistry
  • 374
  • 2
  • 7
  • 14
  • Eh you shouldn't try to even impute strings. Use the columns without strings. Or better yet based on the model you'd be working on, drop the rows with empty values (assuming very high accuracy isn't the target) A few measly unclear values wouldn't even make much difference. Or you might even train a separate model to impute (aka predict) those fields. – Souyama Mar 17 '19 at 14:25
  • The below link has explanation for your issue: https://stackoverflow.com/questions/25239958/impute-categorical-missing-values-in-scikit-learn – Giri Mar 17 '19 at 14:32
  • I referred to this question thread. That did help me fix the problem. Thanks alot! @Giri – Umang Mistry Mar 18 '19 at 08:28
  • Does this answer your question? [Impute categorical missing values in scikit-learn](https://stackoverflow.com/questions/25239958/impute-categorical-missing-values-in-scikit-learn) – zhrist Mar 31 '23 at 07:59

2 Answers2

0

Python doesn't handle categorical variables very well. You need to dummify all your category variables in order to impute the missing values. Even if one column is category,the error pops out.

sandeep patil
  • 204
  • 2
  • 7
0

Use strategy="most_frequent" or strategy="constant"