I am trying to replace NaN values in a given dataset by the column mean using sklearn.preprocessing.Imputer
. Instead of having NaN
replaced I instead find that they are being removed by my code. Here is a short example demonstrating this issue I am facing:
>>> test_data = np.array([float("NaN"), 1, 2, 3])
>>> imp = Imputer(missing_values=float("NaN"), strategy="mean")
>>> imp.fit_transform(test_data)
** Deprecation warning truncated **
array([[1., 2., 3.]])
What should I change so that instead of removing the NaN
it gets replaced by 2.
?
I tried to adapt from the sklearn.preprocessing.Imputer
user guide and was originally following this answer but I must have misunderstood them.
Edit:
I have also tried the following, which gets rid of the deprecation warning but does not change the end result:
>>> test_data = np.array([[float("NaN"), 1, 2, 3]])
>>> imp = Imputer(missing_values=float("NaN"), strategy="mean")
>>> imp.fit_transform(test_data)
array([[1., 2., 3.]])