0

Ive been looking through all the previous posts but cant seem to find an answer on this.
If I create a column with shift I seem to keep getting the:

SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

How do I remove this warning without suppressing it?

Code:

dfTest = pd.DataFrame(range(10),columns=['A'])
Result:
   A
0  0
1  1
2  2
3  3
4  4
5  5
6  6
7  7
8  8
9  9
dfTest['B'] = dfTest['A'].shift(1)
dfTest['B'].iloc[5] = 20

SettingWithCopyWarning...

Edit: I have tried the following but didnt seem to change anything

dfTest.loc[:, 'B'] = dfTest['A'].shift(1)
dfTest = dfTest.copy()
dfTest = dfTest.copy(deep=True)
bchip
  • 5
  • 3
  • Does [How to deal with SettingWithCopyWarning in Pandas](https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas) answer your question? – wwii Dec 02 '20 at 18:09

2 Answers2

0

Why does assignment fail when using chained indexing?

The problem in the previous section is just a performance issue. What’s up with the SettingWithCopy warning? We don’t usually throw warnings around when you do something that might cost a few extra milliseconds!

But it turns out that assigning to the product of chained indexing has inherently unpredictable results. To see this, think about how the Python interpreter executes this code:

dfmi.loc[:, ('one', 'second')] = value
# becomes
dfmi.loc.__setitem__((slice(None), ('one', 'second')), value)

But this code is handled differently:

dfmi['one']['second'] = value
# becomes
dfmi.__getitem__('one').__setitem__('second', value)

See that __getitem__ in there? Outside of simple cases, it’s very hard to predict whether it will return a view or a copy (it depends on the memory layout of the array, about which pandas makes no guarantees), and therefore whether the __setitem__ will modify dfmi or a temporary object that gets thrown out immediately afterward. That’s what SettingWithCopy is warning you about!

Note

You may be wondering whether we should be concerned about the loc property in the first example. But dfmi.loc is guaranteed to be dfmi itself with modified indexing behavior, so dfmi.loc.__getitem__ / dfmi.loc.__setitem__ operate on dfmi directly. Of course, dfmi.loc.__getitem__(idx) may be a view or a copy of dfmi.

Sometimes a SettingWithCopy warning will arise at times when there’s no obvious chained indexing going on. These are the bugs that SettingWithCopy is designed to catch! Pandas is probably trying to warn you that you’ve done this:

def do_something(df):
    foo = df[['bar', 'baz']]  # Is foo a view? A copy? Nobody knows!
    # ... many lines here ...
    # We don't know whether this will modify df or not!
    foo['quux'] = value
    return foo

Yikes!

You can safely disable this warning with the following assignment.

import pandas as pd
pd.options.mode.chained_assignment = None  # default='warn'

Refrence: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

Adnan Ahmed
  • 466
  • 1
  • 6
  • 15
  • The question is `How do I remove this warning without suppressing it?` - This doesn't seem like an answer. If you search with some variation of `python pandas SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame site:stackoverflow.com` - do any of those answers have the info you described in your answer? – wwii Dec 02 '20 at 18:06
0

You can get a reference to the underlying array then change slices of it without causing the warning - i was surprised this worked and haven't found any documentation to support it, I'll add a link later if I can find it. This works but I don't know if there is any inherent risk/problem using this. Using Series.values, Series.to_numpy, or Series.array works. Series.to_numpy is preferred over Series.values.

import numpy as np
import pandas as pd
dfTest = pd.DataFrame(range(10),columns=['A'])
dfTest['B'] = dfTest['A'].shift(1)
x = dfTest.B.values
x[5] = 999
print(dfTest)

Running from a Python shell.

>>> from tmp import *
   A      B
0  0    NaN
1  1    0.0
2  2    1.0
3  3    2.0
4  4    3.0
5  5  999.0
6  6    5.0
7  7    6.0
8  8    7.0
9  9    8.0
>>> x[7] = 1234
>>> dfTest
   A       B
0  0     NaN
1  1     0.0
2  2     1.0
3  3     2.0
4  4     3.0
5  5   999.0
6  6     5.0
7  7  1234.0
8  8     7.0
9  9     8.0
>>> y = dfTest.B.to_numpy()
>>> y[9] = 6543
>>> dfTest
   A       B
0  0     NaN
1  1     0.0
2  2     1.0
3  3     2.0
4  4     3.0
5  5   999.0
6  6     5.0
7  7  1234.0
8  8     7.0
9  9  6543.0
>>>
>>> z = dfTest.B.array
>>> z[2] = 334455
>>> dfTest
   A         B
0  0       NaN
1  1       0.0
2  2  334455.0
3  3       2.0
4  4       3.0
5  5     999.0
6  6       5.0
7  7    1234.0
8  8       7.0
9  9    6543.0
>>>

Series.values() and Series.to_numpy() return the same thing in my Pandas version, 1.1.4. Series.array is a different thing.

>>> x is y
True
>>> x is z
False
>>> y is z
False
>>>
wwii
  • 23,232
  • 7
  • 37
  • 77