I have a Catboost Classifier that predicts on some embedding features, and AFAIK these embedding features can only be specified through Pools (meaning I have to create a pool and then pass the pool for the Catboost classifier's .fit
method in order for the model to pick them up).
These embedding features are generated by a TfidfVectorizer
, so I would like to wrap the TfidfVectorizer
and the classifier as part of an sklearn Pipeline
to tidy up my code and have a clear pipeline to train/predict.
Unfortunately, I cannot pass Catboost Pool
to an sklearn Pipeline
because when I do, I get the following error:
Expected 2D array, got scalar array instead:
array=<catboost.core.Pool object at 0x7f98f0256820>.
Is there any way around this?