0

My pandas dataframe datainput has 4 columns namely COLUMN1, COLUMN2,COLUMN3, COLUMN4 each with values of Yes or No. I am trying to replace "Yes" and "No" values in a pandas dataframe with 1 and 2 using the following code

datainput.COLUMN1.replace(("Yes","No"),(1,0),inplace=True)
datainput.COLUMN2.replace(("Yes","No"),(1,0),inplace=True)
datainput.COLUMN3.replace(("Yes","No"),(1,0),inplace=True)
datainput.COLUMN4.replace(("Yes","No"),(1,0),inplace=True)

I am getting it successfully converted but I am getting an associated warning.

C:\Users\mmpra\Anaconda3\lib\site-packages\pandas\core\generic.py:6786: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._update_inplace(new_data)

How to avoid the warning and what does it mean?

Praveen Kumar-M
  • 223
  • 2
  • 10

1 Answers1

2

You are facing that caveat because the slice/index of the dataframe is returning either a view, or a copy. This warning was created to flag "chained assignment" operations.

You can suppress it by creating a deepcopy of the dataframe:

datainput = datainput.copy(deep=True)

It's always nicer to use numpy.where in such cases. Take this for example:

In [1685]: import numpy as np

In [1686]: df = pd.DataFrame({'A': ['Yes', 'No'], 'B':['Yes', 'Yes']})
In [1687]: df
Out[1687]: 
     A    B
0  Yes  Yes
1   No  Yes

In [1690]: df['A'] = np.where(df['A'].eq('Yes'), 1, 0)
In [1691]: df['B'] = np.where(df['B'].eq('Yes'), 1, 0) 

In [1692]: df
Out[1692]: 
   A  B
0  0  1
1  0  1

No warnings in this case will arise.

Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58
  • This won't solve the problem mentioned in OP's post. Your data is not a copy/slice of any other. – Quang Hoang May 25 '20 at 15:13
  • Agreed. But, it's better to use `np.where` in these cases. Won't you agree. – Mayank Porwal May 25 '20 at 15:14
  • Not necessarily true. Although I don't know if that `replace` command will work, but for example it won't replace values other than `Yes` an `No`, e.g. `NaN`, while `np.where` will. That said, the main question is why the warning, not how to do things. – Quang Hoang May 25 '20 at 15:16
  • @QuangHoang I've put the explanation answering OP's question. Sorry for missing it earlier. – Mayank Porwal May 25 '20 at 15:26
  • @MayankPorwal Thanks for the explanation. Can you please elaborate a bit more on `deep` and `chained assignment` . If I am viewing the copy that means the main data frame (before replacement), is still there but hidden??? – Praveen Kumar-M May 25 '20 at 16:21
  • @QuangHoang How np.where will replace the NaN? – Praveen Kumar-M May 25 '20 at 16:25
  • Hey @PraveenKumar-M. For details regarding `chained assignments`, please follow this [`link`](https://stackoverflow.com/questions/21463589/pandas-chained-assignments). `Deepcopy` is basically the concept where any changes made to a copy of object do not reflect in the original object. Hence in your case, if you create a deepcopy, original df will remain unaffected and no warning will appear. – Mayank Porwal May 25 '20 at 16:46