I have a function that takes dataframe as an argument and while processing this dataframe it calls another function passing a slice of the same dataframe as an argument to the secondary function.
All changes are done in place so nothing is returned (because of the size the dataframe).
But, this secondary function raises SettingWithCopyWarning
since it does not deal with the original dataframe anymore.
Here is an example:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(9).reshape(3,3), columns=list('abc'))
print df
def a(df):
if df.is_copy:
print 'a got a copy'
df['a'] = 'a'
def b(df):
if df.is_copy:
print 'b got a copy'
print df.is_copy
df.loc[:,'b'] = 'b'
def c(df):
a(df)
b(df.loc[0:1,:])
if df.is_copy:
print 'c got a copy'
df.loc[0:1,'c'] = 'c'
def d(df):
new_df = df.loc[0:1].copy(deep=True)
b(new_df)
df.update(new_df)
del new_df
c(df)
df
Results in:
b got a copy
<weakref at 000000000C1DE778; to 'DataFrame' at 000000000C1B9DA0>
a b c
0 a 1 c
1 a 4 c
2 a 7 8
I understand that one option is to create a new dataframe from the slice of the original and to pass it to b
and then df.update(new_df)
and d
shows that it works:
d(df)
df
Produces the desired output:
a b c
0 a b c
1 a b c
2 a 7 8
But is there a way to deal with this without creating new dataframe and raising SettingWithCopyWarning
.
Another complication is that call to b
from within c
sometimes might be just simple b(df)
, so slicing is optional.
Thank you.