How to create sklearn pipeline with custom functions? I have a two functions, one for cleaning data and second for building model.
def preprocess(df):
……………….
# clean data
return df_clean
def model(df_clean):
…………………
#split data train and test and build randomForest Model
return model
So I use FunctionTransformer and created pipeline
from sklearn.pipeline import Pipeline, make_pipeline
from sklearn.preprocessing import FunctionTransformer
pipe = Pipeline([("preprocess", FunctionTransformer(preprocess)),("model",FunctionTransformer(model))])
pred = pipe.predict_proba(new_test_data)
print(pred)
I know above is wrong, not sure how to work on, in the pipe I need to pass the training data first then, I have to pass new_test_data?