I'm using very simple code :
simplePipe = Pipeline([
('string_fix', StringFix()),
])
class StringFix(BaseEstimator, TransformerMixin):
def __init__(self):
pass
def fit(self, X, y = None):
return self
def transform(self, X, y = None):
print('Removing NANs.')
# next 2 lines will throw the SettingWithCopyWarning
X.loc[:, 'f1'] = 'testing'
X.loc[:, 'f1'].replace(np.nan, '', inplace = True)
# this line doesn't throw the warning but it is expected not
# modifying the dataframe.
X.loc[:, 'f1'].replace(np.nan, '', inplace = False)
return X
Interestingly (or not), when I execute this :
trainSetDF = simplePipe.fit_transform(inputDF[:4])
It warns with
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
But not when I execute directly outside of the scikit learn Pipeline :
inputDF.loc[0:4, 'f1'] = 'testing'
Am I missing something here? I've spent a fair amount of time trying to understand why this warning is. And now that I get it, I fixed some code but regardless of what I seem to be doing inside the Pipeline, I consistently get this warning. Is the Pipeline itself doing stuff I don't want? Even if I remove the "return X" which could do things I don't know like copy an array or something, I still get this warning.
Any ideas on what I might be doing wrong?