I am quite confused about when a SettingWithCopyWarning
is raised. For example:
import pandas as pd
df0 = pd.DataFrame([["Fruit", "Apple", 12, 0.3],
["Fruit", "Orange", 23, 0.2],
["Dairy", "Milk", 4, 1],
["Dairy", "Cheese", 1.0, 9.5],
["Meat", "Pork", 8, 11],
["Meat", "Buffalo", 2, 18],
["Fruit", "Strawberry", 45, 2.2]],
columns=["Type", "Item", "Quantity", "Price"])
df1 = df0.loc[df0.loc[:, "Price"] < 10, ["Type", "Item", "Price"]] # a copy(?)
df2 = df0.loc[df0.loc[:, "Price"] < 10] # also a copy, but maybe this is not always the case?
df1.loc[1, "Item"] = "Banana" # works fine
df2.loc[1, "Item"] = "Banana" # raises SettingWithCopyWarning
It seems that df1
is always a copy, whereas, df2
is not always a copy (this time df0
does not change though). Why does this happen? I am more interested in understanding the reason than avoiding the warning itself. I read pandas' documentation on view vs copy, but it is not too enlightening on what loc
returs. Quoting:
dfmi.loc is guaranteed to be dfmi itself with modified indexing behavior
and immediately afterwards:
Of course, dfmi.loc.__ getitem__(idx) may be a view or a copy of dfmi
I saw many interesting discussions, here, here, here (and a few more), but none provides a reproducable example and none actually explains what happens when .loc
is used. I saw that sometimes false positives can exist and that there are some workarounds (turn off warning, set .is_copy = False
), but these fail to actively tackle the problem.
Any insight? Why is it ok to modify df1
but not df2
?