1

I am quite confused about when a SettingWithCopyWarning is raised. For example:

import pandas as pd

df0 = pd.DataFrame([["Fruit", "Apple", 12, 0.3],
                    ["Fruit", "Orange", 23, 0.2],
                    ["Dairy", "Milk", 4, 1],
                    ["Dairy", "Cheese", 1.0, 9.5],
                    ["Meat", "Pork", 8, 11],
                    ["Meat", "Buffalo", 2, 18],
                    ["Fruit", "Strawberry", 45, 2.2]],
                   columns=["Type", "Item", "Quantity", "Price"])

df1 = df0.loc[df0.loc[:, "Price"] < 10, ["Type", "Item", "Price"]]  # a copy(?)
df2 = df0.loc[df0.loc[:, "Price"] < 10]  # also a copy, but maybe this is not always the case?

df1.loc[1, "Item"] = "Banana"  # works fine
df2.loc[1, "Item"] = "Banana"  # raises SettingWithCopyWarning

It seems that df1 is always a copy, whereas, df2 is not always a copy (this time df0does not change though). Why does this happen? I am more interested in understanding the reason than avoiding the warning itself. I read pandas' documentation on view vs copy, but it is not too enlightening on what loc returs. Quoting:

dfmi.loc is guaranteed to be dfmi itself with modified indexing behavior

and immediately afterwards:

Of course, dfmi.loc.__ getitem__(idx) may be a view or a copy of dfmi

I saw many interesting discussions, here, here, here (and a few more), but none provides a reproducable example and none actually explains what happens when .loc is used. I saw that sometimes false positives can exist and that there are some workarounds (turn off warning, set .is_copy = False), but these fail to actively tackle the problem.

Any insight? Why is it ok to modify df1 but not df2?

Community
  • 1
  • 1

0 Answers0