3

I have a Catboost Classifier that predicts on some embedding features, and AFAIK these embedding features can only be specified through Pools (meaning I have to create a pool and then pass the pool for the Catboost classifier's .fit method in order for the model to pick them up).

These embedding features are generated by a TfidfVectorizer, so I would like to wrap the TfidfVectorizer and the classifier as part of an sklearn Pipeline to tidy up my code and have a clear pipeline to train/predict.

Unfortunately, I cannot pass Catboost Pool to an sklearn Pipeline because when I do, I get the following error:

Expected 2D array, got scalar array instead:
array=<catboost.core.Pool object at 0x7f98f0256820>.

Is there any way around this?

Flavia Giammarino
  • 7,987
  • 11
  • 30
  • 40

0 Answers0