2

I am trying to find the score of a given data set with respect to some training data. I have written the following code:

from sklearn.ensemble import RandomForestClassifier
import numpy as np

randomForest = RandomForestClassifier(n_estimators = 200)

li_train1 =  [[1,2,3,4,5,6,7,8,9],[1,2,3,4,5,6,7,8,9]]

li_train2 =  [[1,2,3,4,5,6,7,8,9],[1,2,3,4,5,6,7,8,9]]

li_text1 = [[10,20,30,40,50,60,70,80,90], [10,20,30,40,50,60,70,80,90]]

li_text2 = [[1,2,3,4,5,6,7,8,9],[1,2,3,4,5,6,7,8,9]]

randomForest.fit(li_train1, li_train2)

output =  randomForest.score(li_train1, li_text1)

On compiling and trying to run the program I am getting the error:

Traceback (most recent call last):
  File "trial.py", line 16, in <module>
    output =  randomForest.score(li_train1, li_text1)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/base.py", line 349, in score
    return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 172, in accuracy_score
    y_type, y_true, y_pred = _check_targets(y_true, y_pred)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 89, in _check_targets
    raise ValueError("{0} is not supported".format(y_type))
ValueError: multiclass-multioutput is not supported

On checking the documentation related to the score method it says:

score(X, y, sample_weight=None)
X : array-like, shape = (n_samples, n_features)
    Test samples.

y : array-like, shape = (n_samples) or (n_samples, n_outputs)
    True labels for X.

Both X and y in my case are arrays, 2d arrays.

I also went through this question but I couldn't understand where am I going wrong.

EDIT

So as per the answer and the comments that follow, I have edited the program as follows:

from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import MultiLabelBinarizer
import numpy as np

randomForest = RandomForestClassifier(n_estimators = 200)

mlb = MultiLabelBinarizer()

li_train1 =  [[1,2,3,4,5,6,7,8,9],[1,2,3,4,5,6,7,8,9]]

li_train2 =  [[1,2,3,4,5,6,7,8,9],[1,2,3,4,5,6,7,8,9]]

li_text1 = [100,200]

li_text2 = [[1,2,3,4,5,6,7,8,9],[1,2,3,4,5,6,7,8,9]]

randomForest.fit(li_train1, li_train2)

output =  randomForest.score(li_train1, li_text1)

After this edit I am getting the error:

Traceback (most recent call last):
  File "trial.py", line 19, in <module>
    output =  randomForest.score(li_train1, li_text1)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/base.py", line 349, in score
    return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 172, in accuracy_score
    y_type, y_true, y_pred = _check_targets(y_true, y_pred)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 82, in _check_targets
    "".format(type_true, type_pred))
ValueError: Can't handle mix of binary and multiclass-multioutput
Community
  • 1
  • 1
praxmon
  • 5,009
  • 22
  • 74
  • 121

1 Answers1

0

According to the documentation:

Warning: At present, no metric in sklearn.metrics supports the multioutput-multiclass classification task.

The score method invokes sklearn's accuracy metric but this isn't supported for the multi-class, multi-output classification problem you've defined.

It's not clear from your question if you really intend to solve a multi-class, multi-output problem. If that's not your intention, then you should restructure your input arrays.

If on the other hand you really want to solve this kind of problem, you'll simply need to define your own scoring function.

UPDATE

Since you are not solving a multi-class, multi-label problem you should restructure your data so that it looks something like this:

from sklearn.ensemble import RandomForestClassifier

# training data
X =  [
    [1,2,3,4,5,6,7,8,9],
    [1,2,3,4,5,6,7,8,9]
]

y =  [0,1]

# fit the model
randomForest.fit(X,y)

# test data
Xtest =  [
    [1,2,0,4,5,6,0,8,9],
    [1,1,3,1,5,0,7,8,9]
]

ytest =  [0,1]

output =  randomForest.score(Xtest,ytest)
print(output) # 0.5
Ryan Walker
  • 3,176
  • 1
  • 23
  • 29