Creating pipeline in sklearn with custom functions?

Question

How to create sklearn pipeline with custom functions? I have a two functions, one for cleaning data and second for building model.

def preprocess(df):
   ……………….
   # clean data
   return df_clean

def model(df_clean):
   …………………
   #split data train and test and build randomForest Model
   return model

So I use FunctionTransformer and created pipeline

from sklearn.pipeline import Pipeline, make_pipeline
from sklearn.preprocessing import FunctionTransformer

pipe = Pipeline([("preprocess", FunctionTransformer(preprocess)),("model",FunctionTransformer(model))])

pred = pipe.predict_proba(new_test_data)
print(pred)

I know above is wrong, not sure how to work on, in the pipe I need to pass the training data first then, I have to pass new_test_data?

https://stackoverflow.com/questions/31259891/put-customized-functions-in-sklearn-pipeline — Arpit Sisodia, Feb 24 '20 at 10:56
Does this answer your question? [Put customized functions in Sklearn pipeline](https://stackoverflow.com/questions/31259891/put-customized-functions-in-sklearn-pipeline) — Arpit Sisodia, Feb 24 '20 at 10:57

score 4 · Answer 1 · answered Jun 02 '20 at 07:22

you need to create your own class that inherits BaseEstimator, TransformerMixin of sklearn.

then specify your function in fit/transform/fit_transform / predict/predict_prob etc functions of your own class.

Put customized functions in Sklearn pipeline

score 0 · Answer 2 · answered Feb 04 '21 at 10:53

A better and easy way to do this is using Kedro, it doesn't care about the object type and you can write any custom function for using inside a pipeline. You can use kedro.Pipeline to put all your functions in sequence and call them as you would do in sklearn pipeline. The syntaxes are little different and more flexible than sklearn.

You can learn more about kedro here or their official documentation.

Creating pipeline in sklearn with custom functions?

2 Answers2