Dataframe's replace method behaving weird

Question

I have a dataset and want to replace the 0's of a particular column with None.

diabetes_data = pd.read_csv("https://raw.githubusercontent.com/npradaschnor/Pima-Indians-Diabetes-Dataset/master/diabetes.csv")
temp = diabetes_data.copy()
null_index = diabetes_data[diabetes_data["Glucose"]==0].index
print(diabetes_data.loc[null_index])

Then I use the replace method and try to replace 0 with None

temp["Glucose"] = diabetes_data["Glucose"].replace(0, None)
print(temp.loc[null_index])

In the above image it can be seen that the values are not replaced with None, also I am not sure what these values are.

But when I pass the arguments to replace function as a list then it works as expected.

temp1 = diabetes_data.copy()
temp1["Glucose"].replace([0], [None], inplace=True)
temp1.loc[null_index]

Above is the output I expect. The documentation for replace says that it accepts int as well.

So I am not able to understand why is it giving weird result when int is passed?

score 2 · Accepted Answer · edited Sep 17 '21 at 10:53

Instead of replacing 0 by None, we can use numpy.nan like so :

>>> import numpy as np
>>> temp["Glucose"] = diabetes_data["Glucose"].replace(0, np.nan)
>>> temp.loc[null_index]
        Pregnancies     Glucose     BloodPressure   SkinThickness   Insulin     BMI     DiabetesPedigreeFunction    Age     Outcome
75      1               NaN         48              20              0           24.7    0.140                       22      0
182     1               NaN         74              20              23          27.7    0.299                       21      0
342     1               NaN         68              35              0           32.0    0.389                       22      0
349     5               NaN         80              32              0           41.0    0.346                       37      1
502     6               NaN         68              41              0           39.0    0.727                       41      1

What is going on:

The first two arguments to .replace are to_replace, and values, both of which default to None.

When you explicitly pass None as the second argument (i.e. for values), then there is no difference from just calling the replace function without the values argument at all. Without any further arguments passed, calling .replace will refer to the method argument: which defaults to pad: a probably very undesired effect in this case.

This means the issue isn't to do with the fact you're using int, it's to do with the value you're trying to replace the int with.

An example from the pandas documentation:

This case is actually explicitly explained in the documentation, and a workaround using a dictionary argument is provided:

Compare the behavior of s.replace({'a': None}) and s.replace('a', None) to understand the peculiarities of the to_replace parameter:

>>> s = pd.Series([10, 'a', 'a', 'b', 'a'])

When one uses a dict as the to_replace value, it is like the value(s) in the dict are equal to the value parameter. s.replace({'a': None}) is equivalent to s.replace(to_replace={'a': None}, value=None, method=None):

s.replace({'a': None})
0      10
1    None
2    None
3       b
4    None
dtype: object

When value=None and to_replace is a scalar, list or tuple, replace uses the method parameter (default ‘pad’) to do the replacement. So this is why the ‘a’ values are being replaced by 10 in rows 1 and 2 and ‘b’ in row 4 in this case. The command s.replace('a', None) is actually equivalent to s.replace(to_replace='a', value=None, method='pad'):

s.replace('a', None)
0    10
1    10
2    10
3     b
4     b
dtype: object

I tried the code and as @learnToCode said it doesn't work. Using `np.nan` works directly. Yes I tried it :), the code displayed in this answer comes from my side. — tlentali, Sep 17 '21 at 10:31
Seems this is copied from dupe... Is possible convert to wiki? — jezrael, Sep 17 '21 at 10:34
I am sorry, I don't think that I have enough reputation to do such a thing. I am checking what convert to wiki means in the SO doc. — tlentali, Sep 17 '21 at 10:41
@jezrael The values the OP sees are ffill (`pad`) values, from the source: https://github.com/pandas-dev/pandas/blob/v1.3.3/pandas/core/generic.py#L6472 — Andrej Kesely, Sep 17 '21 at 10:42
Do you try it? check EDIT and then convert to Community wiki under answer. — jezrael, Sep 17 '21 at 10:42
Instead of `convert to wiki` I have a `community wiki` checkbox under my answer, it says that by checking that I will lose some reputation point. Should I do it ? — tlentali, Sep 17 '21 at 10:46
@jezrael as you asked me to do so, I did convert it, hope it is better now. — tlentali, Sep 17 '21 at 10:55
@tlentali - Sure, you lost it, by I can motivite you some way for not lost ;) — jezrael, Sep 17 '21 at 10:55

Dataframe's replace method behaving weird

1 Answers1

What is going on:

An example from the pandas documentation: