Instead of replacing 0
by None
, we can use numpy.nan
like so :
>>> import numpy as np
>>> temp["Glucose"] = diabetes_data["Glucose"].replace(0, np.nan)
>>> temp.loc[null_index]
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age Outcome
75 1 NaN 48 20 0 24.7 0.140 22 0
182 1 NaN 74 20 23 27.7 0.299 21 0
342 1 NaN 68 35 0 32.0 0.389 22 0
349 5 NaN 80 32 0 41.0 0.346 37 1
502 6 NaN 68 41 0 39.0 0.727 41 1
What is going on:
The first two arguments to .replace
are to_replace
, and values
, both of which default to None
.
When you explicitly pass None
as the second argument (i.e. for values
), then there is no difference from just calling the replace function without the values
argument at all. Without any further arguments passed, calling .replace
will refer to the method
argument: which defaults to pad
: a probably very undesired effect in this case.
This means the issue isn't to do with the fact you're using int
, it's to do with the value you're trying to replace the int
with.
This case is actually explicitly explained in the documentation, and a workaround using a dictionary argument is provided:
Compare the behavior of s.replace({'a': None}) and s.replace('a', None) to understand the peculiarities of the to_replace parameter:
>>> s = pd.Series([10, 'a', 'a', 'b', 'a'])
When one uses a dict as the to_replace value, it is like the value(s) in the dict are equal to the value parameter. s.replace({'a': None})
is equivalent to s.replace(to_replace={'a': None}, value=None, method=None)
:
s.replace({'a': None})
0 10
1 None
2 None
3 b
4 None
dtype: object
When value=None
and to_replace is a scalar, list or tuple, replace uses the method parameter (default ‘pad’) to do the replacement. So this is why the ‘a’ values are being replaced by 10 in rows 1 and 2 and ‘b’ in row 4 in this case. The command s.replace('a', None)
is actually equivalent to s.replace(to_replace='a', value=None, method='pad')
:
s.replace('a', None)
0 10
1 10
2 10
3 b
4 b
dtype: object