1

I have the following scikit learn pipeline used for some data preprocessing.

If there is categorical feature in a dataframe, I would like to extract the features and run through SimpleImputer; if there is no such feature (i.e., dataframe['categoricals'] does not exist), I would like it simply "skip"/passthrough the pipeline and proceed to the next step.

How to achieve this?

Pipeline ([
('extract', extract_feature(dataframe['categoricals]),
('fill', SimpleImputer(strategy='constant', fill_value='dummy')

])
tudou
  • 467
  • 1
  • 7
  • 20
  • 1
    If no straight way to skip the step, is there some way to build a "wrapper" to pass the otpion (enable/disable the next pipeline) as some global variable? – tudou Sep 09 '19 at 13:07
  • Does this answer your question? [Is it possible to toggle a certain step in sklearn pipeline?](https://stackoverflow.com/questions/19262621/is-it-possible-to-toggle-a-certain-step-in-sklearn-pipeline) – Ben Reiniger Jun 07 '21 at 13:57

1 Answers1

2
  1. implement a wrapper around the transformer to give it an argument, e.g., if_skip, and toggle on/off this argument to enable/disable this transformer. Of course, you can set if_skip as a instance variable, e.g., self.if_skip, and assign the value from a previous pipeline step if you wish

    SkipSimpleImputer(if_skip=False,strategy='constant', fill_value='dummy')
    
    class SkipSimpleImputer(SimpleImputer):
    def __init__(if_skip=False, strategy='constant', fill_value='dummy')
        pass
    
  2. wrap it with if else :(, not nice solution, but at least it is a solution,

Enxi
  • 101
  • 2
  • Thanks, yes, if else would work but very nasty solution. The wrapper seems to be a clean approach, I will give a try. – tudou Sep 10 '19 at 15:33