0

I have a dataset and want to replace the 0's of a particular column with None.

diabetes_data = pd.read_csv("https://raw.githubusercontent.com/npradaschnor/Pima-Indians-Diabetes-Dataset/master/diabetes.csv")
temp = diabetes_data.copy()
null_index = diabetes_data[diabetes_data["Glucose"]==0].index
print(diabetes_data.loc[null_index])

Output of original DataFrame

Then I use the replace method and try to replace 0 with None

temp["Glucose"] = diabetes_data["Glucose"].replace(0, None)
print(temp.loc[null_index])

Output of temp dataframe

In the above image it can be seen that the values are not replaced with None, also I am not sure what these values are.

But when I pass the arguments to replace function as a list then it works as expected.

temp1 = diabetes_data.copy()
temp1["Glucose"].replace([0], [None], inplace=True)
temp1.loc[null_index]

Output when parameters are passed as list

Above is the output I expect. The documentation for replace says that it accepts int as well.

So I am not able to understand why is it giving weird result when int is passed?

learnToCode
  • 341
  • 4
  • 14

1 Answers1

2

Instead of replacing 0 by None, we can use numpy.nan like so :

>>> import numpy as np
>>> temp["Glucose"] = diabetes_data["Glucose"].replace(0, np.nan)
>>> temp.loc[null_index]
        Pregnancies     Glucose     BloodPressure   SkinThickness   Insulin     BMI     DiabetesPedigreeFunction    Age     Outcome
75      1               NaN         48              20              0           24.7    0.140                       22      0
182     1               NaN         74              20              23          27.7    0.299                       21      0
342     1               NaN         68              35              0           32.0    0.389                       22      0
349     5               NaN         80              32              0           41.0    0.346                       37      1
502     6               NaN         68              41              0           39.0    0.727                       41      1

What is going on:

The first two arguments to .replace are to_replace, and values, both of which default to None.

When you explicitly pass None as the second argument (i.e. for values), then there is no difference from just calling the replace function without the values argument at all. Without any further arguments passed, calling .replace will refer to the method argument: which defaults to pad: a probably very undesired effect in this case.

This means the issue isn't to do with the fact you're using int, it's to do with the value you're trying to replace the int with.

An example from the pandas documentation:

This case is actually explicitly explained in the documentation, and a workaround using a dictionary argument is provided:

Compare the behavior of s.replace({'a': None}) and s.replace('a', None) to understand the peculiarities of the to_replace parameter:

>>> s = pd.Series([10, 'a', 'a', 'b', 'a'])

When one uses a dict as the to_replace value, it is like the value(s) in the dict are equal to the value parameter. s.replace({'a': None}) is equivalent to s.replace(to_replace={'a': None}, value=None, method=None):

s.replace({'a': None})
0      10
1    None
2    None
3       b
4    None
dtype: object

When value=None and to_replace is a scalar, list or tuple, replace uses the method parameter (default ‘pad’) to do the replacement. So this is why the ‘a’ values are being replaced by 10 in rows 1 and 2 and ‘b’ in row 4 in this case. The command s.replace('a', None) is actually equivalent to s.replace(to_replace='a', value=None, method='pad'):

s.replace('a', None)
0    10
1    10
2    10
3     b
4     b
dtype: object
nathan.j.mcdougall
  • 475
  • 1
  • 6
  • 12
tlentali
  • 3,407
  • 2
  • 14
  • 21
  • 1
    I tried the code and as @learnToCode said it doesn't work. Using `np.nan` works directly. Yes I tried it :), the code displayed in this answer comes from my side. – tlentali Sep 17 '21 at 10:31
  • 1
    Check dupe, there is it explain like need OP. – jezrael Sep 17 '21 at 10:33
  • Yes good idea, I look for it because I can't explain it. – tlentali Sep 17 '21 at 10:34
  • Seems this is copied from dupe... Is possible convert to wiki? – jezrael Sep 17 '21 at 10:34
  • I am sorry, I don't think that I have enough reputation to do such a thing. I am checking what convert to wiki means in the SO doc. – tlentali Sep 17 '21 at 10:41
  • 1
    @jezrael The values the OP sees are ffill (`pad`) values, from the source: https://github.com/pandas-dev/pandas/blob/v1.3.3/pandas/core/generic.py#L6472 – Andrej Kesely Sep 17 '21 at 10:42
  • Do you try it? check EDIT and then convert to Community wiki under answer. – jezrael Sep 17 '21 at 10:42
  • Instead of `convert to wiki` I have a `community wiki` checkbox under my answer, it says that by checking that I will lose some reputation point. Should I do it ? – tlentali Sep 17 '21 at 10:46
  • 1
    @jezrael as you asked me to do so, I did convert it, hope it is better now. – tlentali Sep 17 '21 at 10:55
  • @tlentali - Sure, you lost it, by I can motivite you some way for not lost ;) – jezrael Sep 17 '21 at 10:55