0

I don't understand why I am getting the dreaded warning when I am doing exactly as instructed by the official documentation.

We have a dataframe called a

a = pd.DataFrame(data = [['Tom',1],
                         ['Tom',1], 
                         ['Dick',1], 
                         ['Dick',1], 
                         ['Harry',1],
                         ['Harry',1]], columns = ['Col1', 'Col2'])

a

Out[377]: 
    Col1  Col2
0    Tom     1
1    Tom     1
2   Dick     1
3   Dick     1
4  Harry     1
5  Harry     1

First we create a "holder" dataframe:

holder = a

Then we create a subset of a:

c = a.loc[a['Col1'] == 'Tom',:]

c

Out[379]: 
  Col1  Col2
0  Tom     1
1  Tom     1

We create another subset d which will be added to (a slice of) the previous subset c but once we try to add d to c, we get the warning:

d = a.loc[a['Col1'] == 'Tom','Col2']

d

Out[389]: 
0    1
1    1


c.loc[:,'Col2'] += d

C:\Users\~\anaconda3\lib\site-packages\pandas\core\indexing.py:494: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[item] = s

I would like to understand what I am doing wrong because I use this logic very often (coming from R where everything is not a darn object)

gmarais
  • 1,801
  • 4
  • 16
  • 32
  • Could you add a complete and reproducible code example (i.e. a fully self-standing code example that produces this `SettingWithCopyWarning` when run)? – Xukrao Feb 04 '20 at 19:01
  • Have you read some of the resources on the subject? For example: https://stackoverflow.com/q/20625582/11301900. As an aside, if you want to get a subset of the DataFrame consisting of multiple columns you can simply do `df[['col_1', 'col_3']]` instead of using `.loc[]`. – AMC Feb 04 '20 at 22:44
  • Also, could you share a [mcve]? I'm not managing to reproduce the issue. – AMC Feb 04 '20 at 22:51

1 Answers1

0

After noticing a different issue, I found a solution.

Whenever you say

dataframe_A = dataframe_B

you need to proceed with caution because Python, it seems, joins these two dataframes by the hip, so to speak. If you make changes to dataframe_B your dataframe_A will also change!

I understand just enough to fix the problem by using .copy(deep=True) where python will create a full and independent copy so that you can make changes to one without affecting the other one.

On further investigation, and for those interested, it apparently has to do with "pointers" which is a slightly complicated coding concept with a scope beyond this specific question.

gmarais
  • 1,801
  • 4
  • 16
  • 32