0

I am trying to use this appoach https://stackoverflow.com/a/44117716/11102206 to predict 24 variables, but I'm getting ValueError: Multioutput target data is not supported with label binarization. Any help

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size = 0.2, random_state = 669)

params = {
'n_estimators': 10,
'max_depth': 8,

}

xgbc = xgb.XGBClassifier(**params)
ova_xgbc = OneVsRestClassifier(xgbc)
ova_xgbc.fit(X_train, y_train)

ova_preds = ova_xgbc.predict(X_val)
user11102206
  • 1
  • 1
  • 1
  • Can you give the data (format) of y_train/y_val? – Zealseeker Jul 23 '19 at 14:53
  • Hi Zealseeker it's pandas.core.frame.DataFrame – user11102206 Jul 23 '19 at 15:27
  • No. I was just afraid you did not distinguish multi-class from multi-label. I guess your y of one sample is like [2,4] or [0,1,0,1,...]. It's multi-label. Please see my answer – Zealseeker Jul 23 '19 at 15:34
  • It is Multilabel , but from what I saw OneVsRest can be used for Multilabel problems as well , but sure I am doing some stupid thing here, I've no knwoledge at all on this. But thank you anyaway... – user11102206 Jul 24 '19 at 11:16

1 Answers1

1

Multi-class is different from Multi-label. The former is that one sample has only one label, which can be 0 to k (k>1). And the latter is that one sample can have more than one label, e.g. y=[1,3].

The multiclass module you mentioned is used to transform binary classification into multi-classification, but it can not solve multi-label problems.

I suggest you browse https://scikit-learn.org/stable/modules/multiclass.html to see which algorithms support multi-label.

In addition, I'd like to introduce another package, scikit-multilearn (http://scikit.ml/index.html), which is based on and similar to sklearn. But it is designed to solve multi-label problems.

There are several ways to transform multi-label problems to multi-classification. So please have a look at the tutorial first and then to decide what algorithm to use.

Zealseeker
  • 823
  • 1
  • 7
  • 23