How to fit Sklearn Pipeline on Catboost Classifier with Embedding features

Asked Jul 04 '22 at 10:47

Active Jul 04 '22 at 12:21

Viewed 398 times

I have a Catboost Classifier that predicts on some embedding features, and AFAIK these embedding features can only be specified through Pools (meaning I have to create a pool and then pass the pool for the Catboost classifier's .fit method in order for the model to pick them up).

These embedding features are generated by a TfidfVectorizer, so I would like to wrap the TfidfVectorizer and the classifier as part of an sklearn Pipeline to tidy up my code and have a clear pipeline to train/predict.

Unfortunately, I cannot pass Catboost Pool to an sklearn Pipeline because when I do, I get the following error:

Expected 2D array, got scalar array instead:
array=<catboost.core.Pool object at 0x7f98f0256820>.

Is there any way around this?

edited Jul 04 '22 at 12:21

Flavia Giammarino

7,987
11
30
40

asked Jul 04 '22 at 10:47

Edouard Malet

How to fit Sklearn Pipeline on Catboost Classifier with Embedding features

0 Answers0