tl;dr - make a copy of the slice using copy
or suppress the warning with pd.set_option('mode.chained_assignment', None)
There are some great posts about SettingWithCopy Warnings. First off, I say, this is just a warning and not an error. Most of the time this is warning you of behavior you didn't really intend to happen anyway or you really don't care.
Now, let's avoid this warning. Giving your data I am going to duplicate the warning first on purpose.
df = pd.DataFrame(data=np.random.randn(2000000, 26),
columns=list('ABCDEFGHIJKLMNOPQRSTUVWXYZ'))
# if we use execute df['Z'] = df['A'] % df['C']/2 no warning here.
df['Z'] = df['A'] % df['C']/2
# However, let's slice this dataframe just removing the last row using this syntax
df_slice = df.loc[:1999998]
df_slice['Z'] = df_slice['A'] % df_slice['C']/2
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:1:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] =
value instead
See the caveats in the documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
"""Entry point for launching an IPython kernel.
In this case, this warning is letting you know you are changing the original df object.
df = pd.DataFrame(data=np.random.randn(2000000, 26),
columns=list('ABCDEFGHIJKLMNOPQRSTUVWXYZ'))
df_slice = df.loc[:1999998]
df_slice['Z'] = df_slice['A'] % df_slice['C']/2
all(df.loc[:1999998, 'Z'] == df_slice['Z'])
Returns the above warning and True
, modifying the slice did change the original df object.
Now, to avoid the warning and not changing the original object use copy
df = pd.DataFrame(data=np.random.randn(2000000, 26),
columns=list('ABCDEFGHIJKLMNOPQRSTUVWXYZ'))
df_slice = df.loc[:1999998].copy()
df_slice['Z'] = df_slice['A'] % df_slice['C']/2
all(df.loc[:1999998, 'Z'] == df_slice['Z'])
Returns no warning and False.
So, this is one way to use retaining your performance with first and second methods by using .copy()
when creating your slice/view of a dataframe.
However, you are correct this does take extra memory. Overwrite your dataframe with .copy()
Or you can turn this warning off using:
pd.set_option('mode.chained_assignment', None)
df = pd.DataFrame(data=np.random.randn(2000000, 26),
columns=list('ABCDEFGHIJKLMNOPQRSTUVWXYZ'))
df_slice = df.loc[:1999998]
df_slice['Z'] = df_slice['A'] % df_slice['C']/2
all(df.loc[:1999998, 'Z'] == df_slice['Z'])
Returns No warning and True.
In short, pandas sometimes creates a new object for slices of a dataframe, and sometimes it doesn't, where this new slice is a view of the original dataframe. When pandas does this is understood by few and not very well documented I where I could find it.
There is a strong hint to when this warning will appear and that is to use the _is_view
attribute.
df_slice = df.loc[:1999998]
df_slice._is_view
Returns True, hence the SettingWithCopyError might happen.
df_slice = df.loc[:1999998].copy()
df_slice._is_view
Returns False.