1

I'm using very simple code :

simplePipe = Pipeline([
('string_fix', StringFix()),
])
class StringFix(BaseEstimator, TransformerMixin):
    def __init__(self):
        pass

    def fit(self, X, y = None):
        return self

    def transform(self, X, y = None):
        print('Removing NANs.')
        # next 2 lines will throw the SettingWithCopyWarning 
        X.loc[:, 'f1'] = 'testing'
        X.loc[:, 'f1'].replace(np.nan, '', inplace = True)
        # this line doesn't throw the warning but it is expected not 
        # modifying the dataframe.
        X.loc[:, 'f1'].replace(np.nan, '', inplace = False)
        return X

Interestingly (or not), when I execute this :

trainSetDF = simplePipe.fit_transform(inputDF[:4])

It warns with

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

But not when I execute directly outside of the scikit learn Pipeline :

inputDF.loc[0:4, 'f1'] = 'testing'

Am I missing something here? I've spent a fair amount of time trying to understand why this warning is. And now that I get it, I fixed some code but regardless of what I seem to be doing inside the Pipeline, I consistently get this warning. Is the Pipeline itself doing stuff I don't want? Even if I remove the "return X" which could do things I don't know like copy an array or something, I still get this warning.

Any ideas on what I might be doing wrong?

omartin2010
  • 421
  • 1
  • 5
  • 15
  • Possible duplicate of [How to deal with SettingWithCopyWarning in Pandas?](https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas) – CAPSLOCK Mar 18 '19 at 15:22
  • Wish it was a duplicate but I can't see it being one since the .loc[index, col] works outside of the scikit learn pipeline but not inside of it... :( – omartin2010 Mar 18 '19 at 15:23
  • Here you set `inplace=False`, hence you need to assign the result to something. That's why it doesn't modify `X`. `X.loc[:, 'f1'].replace(np.nan, '', inplace = False) # NO WARN BUT # DOES NOT ALTER X` – CAPSLOCK Mar 18 '19 at 15:26
  • Thanks but let me clarify my question/issue. It is around `inplace = True` or the direct assignment (`X.loc[:, 'f1'] = 'testing'`) throwing both s a warning. This is what I'm not getting... and when I run the same line outside of the Pipeline, it works without a warning. – omartin2010 Mar 18 '19 at 15:29
  • I believe the problem is in the way you input your dataframe as an argument to the function. Try the solution I posted – CAPSLOCK Mar 18 '19 at 15:30
  • Why not just use the basic indexing in pandas: `X['f1'] = 'testing'` then `X['f1'] = X['f1'].replace(np.nan, '')` – It_is_Chris Mar 18 '19 at 15:34

1 Answers1

1
trainSetDF = simplePipe.fit_transform(inputDF.iloc[:4][:])
desertnaut
  • 57,590
  • 26
  • 140
  • 166
CAPSLOCK
  • 6,243
  • 3
  • 33
  • 56