Consider having following sklearn Pipeline
:
pipeline = make_pipeline(
TfidfVectorizer(),
LinearRegression()
)
I have TfidfVectorizer
pretrained, so when I am calling pipeline.fit(X, y)
I want only LinearRegression
to be fitted and I don't want to refit TfidfVectorizer
.
I am able to just apply transformation in advance and fit LinearRegression
on transformed data, but in my project I have a lot of transformers in a pipeline, where some of them are pretrained and some aren't, so I am searching for a way of not writing another wrapper around sklearn estimators and stay in a bounds of one Pipeline
object.
To my mind, it should be a parameter in the estimators object that stands for not refitting object when calling .fit()
if object is already fitted.