I am trying to understand pandas SettingWithCopyWarning, what exactly triggers it an how to avoid it. I want to take a selection of columns from a data frame and then work with this selection of columns. I need to fill missing values and replace all values larger than 1 with 1.
I understand that sub_df=df[['col1', 'col2', 'col3']] produces a copy and that seems to be what I want. Could someone explain why the copy warning is triggered here, whether it's problem, and how I should avoid it?
I read a lot about chained assignment in this context, am I doing this here?
data={'col1' : [25 , 0, 100, None],
'col2' : [50 , 0 , 0, None],
'col3' : [None, None, None, 100],
'col4' : [ 20 , 20 , 20 , 20 ],
'col5' : [1,1,2,3]}
df= pd.DataFrame(data)
sub_df=df[['col1', 'col2', 'col3']]
sub_df.fillna(0, inplace=True)
sub_df[df>1]=1 # produces the copy warning
sub_df
What really confuses me is why this warning is not triggered if I am not using a new name for my subset of columns as below:
data={'col1' : [25 , 0, 100, None],
'col2' : [50 , 0 , 0, None],
'col3' : [None, None, None, 100],
'col4' : [ 20 , 20 , 20 , 20 ],
'col5' : [1,1,2,3]}
df= pd.DataFrame(data)
df=df[['col1', 'col2', 'col3']]
df.fillna(0, inplace=True)
df[df>1]=1 # does not produce the copy warning
df
Thanks!