4

I have the following code but don't quite understand why it is throwing the warning. I have read the documentation but still cannot wrap my head around why this usage would result in the warning. Any insight would be appreciated.

>>> df = pandas.DataFrame({'a': [1,2,3,4,5,6,7], 'b': [11,22,33,44,55,66,77]})
>>> reduced_df = df[df['a'] > 3]
>>> reduced_df
   a   b
3  4  44
4  5  55
5  6  66
6  7  77
>>> reduced_df['a'] /= 3

Warning (from warnings module):
   File "__main__", line 1
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
>>> reduced_df
          a   b
3  1.333333  44
4  1.666667  55
5  2.000000  66
6  2.333333  77
derNincompoop
  • 672
  • 11
  • 22
  • This line `reduced_df = df[df['a'] > 3]` means `reduced_df` is a copy of the slice from `df`, you are now trying to assign some value on this hence the warning. You can take an explicit copy like so: `reduced_df = df[df['a'] > 3].copy()` or if you wanted to modify the original df then `df.loc[df['a']>3,'a'] = df['a']/3` – EdChum Oct 23 '14 at 20:10
  • I guess the reason the warning exists is that typically you expect in python that `foo = bar` will mean that `foo` is a reference to `bar` so if you modify `foo` then you expect `bar` to change which isn't the case here – EdChum Oct 23 '14 at 20:13
  • So the warning is telling me that any changes I make to reduced_df will not show up in df because the `reduced_df = df[df['a'] > 3]` line produces a deep copy? – derNincompoop Oct 23 '14 at 20:16
  • yes, but because you've not been explicit here it's warning you what will happen, like I said if you did `reduced_df = df[df['a'] >3].copy()` then there is no warning as you have now explicitly taken a deep copy – EdChum Oct 23 '14 at 20:20
  • Excellent - the fact that it was making a deep copy is what I was not getting. – derNincompoop Oct 23 '14 at 20:22

1 Answers1

6

The warning here is to tell you that your reduced_df despite appearances is not a reference to a slice of your df but in fact a copy. This is different to the normal semantics where one would expect this to result in a reference and that modifications to that reference will affect the reference and the original object (for mutable objects of course):

In [14]:

foo = [0]
bar = foo
bar.append(1)
print(foo,bar)
[0, 1] [0, 1]

So if you wanted to modify a particular slice of your df then you should do what the warning suggests:

In [18]:

df.loc[df['a']>3,'a'] =df['a']/3
df
Out[18]:
          a   b
0  1.000000  11
1  2.000000  22
2  3.000000  33
3  1.333333  44
4  1.666667  55
5  2.000000  66
6  2.333333  77

Or make an explicit deep copy calling copy() and modify the copy without any warning generated:

In [20]:

reduced_df = df[df['a'] > 3].copy()
reduced_df['a'] /=3
reduced_df
Out[20]:
          a   b
3  1.333333  44
4  1.666667  55
5  2.000000  66
6  2.333333  77

In [21]:
# orig df is unmodified
df
Out[21]:
   a   b
0  1  11
1  2  22
2  3  33
3  4  44
4  5  55
5  6  66
6  7  77
EdChum
  • 376,765
  • 198
  • 813
  • 562