My quesion is not about how to deal with SettingWithCopyWarning
. It is how to best deal with it in a specific context.
I have a function to which an entire DataFrame is passed. As part of this function, I'll keep just a subset, then will modify one column, before returning the final results. As expected, this triggers the SettingWithCopyWarning
.
My point is... yes, that's the whole point: I want to work on chunks of my DataFrame without actually touching the source (thus modifying a copy), while clearing memory after each chunk is being processed (hence the function). Here's an illustration:
In [1]: import pandas as pd
In [2]: def add_one(df, column):
...: df = df[df['A']==1]
...: df['A'] = df['A'] + 1
...: print(df['A'])
...:
In [3]: test_df = pd.DataFrame({'A': [1,1,2]})
In [4]: add_one(test_df, 'A')
0 2
1 2
Name: A, dtype: int64
/tmp/ipykernel_132006/2423483857.py:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df['A'] = df['A'] + 1
In [5]: test_df
Out[5]:
A
0 1
1 1
2 2
Should I just supporess the warning with pd.set_option('mode.chained_assignment', None)
(which I understand is bad practice), or is there a smarter way?