RandomForest score method ValueError

Question

I am trying to find the score of a given data set with respect to some training data. I have written the following code:

from sklearn.ensemble import RandomForestClassifier
import numpy as np

randomForest = RandomForestClassifier(n_estimators = 200)

li_train1 =  [[1,2,3,4,5,6,7,8,9],[1,2,3,4,5,6,7,8,9]]

li_train2 =  [[1,2,3,4,5,6,7,8,9],[1,2,3,4,5,6,7,8,9]]

li_text1 = [[10,20,30,40,50,60,70,80,90], [10,20,30,40,50,60,70,80,90]]

li_text2 = [[1,2,3,4,5,6,7,8,9],[1,2,3,4,5,6,7,8,9]]

randomForest.fit(li_train1, li_train2)

output =  randomForest.score(li_train1, li_text1)

On compiling and trying to run the program I am getting the error:

Traceback (most recent call last):
  File "trial.py", line 16, in <module>
    output =  randomForest.score(li_train1, li_text1)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/base.py", line 349, in score
    return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 172, in accuracy_score
    y_type, y_true, y_pred = _check_targets(y_true, y_pred)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 89, in _check_targets
    raise ValueError("{0} is not supported".format(y_type))
ValueError: multiclass-multioutput is not supported

On checking the documentation related to the score method it says:

score(X, y, sample_weight=None)
X : array-like, shape = (n_samples, n_features)
    Test samples.

y : array-like, shape = (n_samples) or (n_samples, n_outputs)
    True labels for X.

Both X and y in my case are arrays, 2d arrays.

I also went through this question but I couldn't understand where am I going wrong.

EDIT

So as per the answer and the comments that follow, I have edited the program as follows:

from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import MultiLabelBinarizer
import numpy as np

randomForest = RandomForestClassifier(n_estimators = 200)

mlb = MultiLabelBinarizer()

li_train1 =  [[1,2,3,4,5,6,7,8,9],[1,2,3,4,5,6,7,8,9]]

li_train2 =  [[1,2,3,4,5,6,7,8,9],[1,2,3,4,5,6,7,8,9]]

li_text1 = [100,200]

li_text2 = [[1,2,3,4,5,6,7,8,9],[1,2,3,4,5,6,7,8,9]]

randomForest.fit(li_train1, li_train2)

output =  randomForest.score(li_train1, li_text1)

After this edit I am getting the error:

Traceback (most recent call last):
  File "trial.py", line 19, in <module>
    output =  randomForest.score(li_train1, li_text1)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/base.py", line 349, in score
    return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 172, in accuracy_score
    y_type, y_true, y_pred = _check_targets(y_true, y_pred)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 82, in _check_targets
    "".format(type_true, type_pred))
ValueError: Can't handle mix of binary and multiclass-multioutput

Ryan Walker · Answer 1 · 2016-11-18T13:07:01.780

0

According to the documentation:

Warning: At present, no metric in sklearn.metrics supports the multioutput-multiclass classification task.

The score method invokes sklearn's accuracy metric but this isn't supported for the multi-class, multi-output classification problem you've defined.

It's not clear from your question if you really intend to solve a multi-class, multi-output problem. If that's not your intention, then you should restructure your input arrays.

If on the other hand you really want to solve this kind of problem, you'll simply need to define your own scoring function.

UPDATE

Since you are not solving a multi-class, multi-label problem you should restructure your data so that it looks something like this:

from sklearn.ensemble import RandomForestClassifier

# training data
X =  [
    [1,2,3,4,5,6,7,8,9],
    [1,2,3,4,5,6,7,8,9]
]

y =  [0,1]

# fit the model
randomForest.fit(X,y)

# test data
Xtest =  [
    [1,2,0,4,5,6,0,8,9],
    [1,1,3,1,5,0,7,8,9]
]

ytest =  [0,1]

output =  randomForest.score(Xtest,ytest)
print(output) # 0.5

edited Nov 18 '16 at 13:07

answered Nov 18 '16 at 06:28

Ryan Walker

3,176
1
23
29

`restructure your input arrays`: How do you mean, should I make one dimensional arrays? – praxmon Nov 18 '16 at 06:29
Are you trying to solve the multiclass, multilabel problem? – Ryan Walker Nov 18 '16 at 06:30
No, I don't know yet, I am just trying stuff out, but for now let's assume I don't have to solve a multi class multi label problem. – praxmon Nov 18 '16 at 06:31
In that case, `y` should be a 1d array and `X` a 2d array. – Ryan Walker Nov 18 '16 at 06:33
So I edited the code to: `output = randomForest.score(li_train1, li_text1[0])` And it gives me the error: `ValueError: Found input variables with inconsistent numbers of samples: [9, 2]` Any idea why? – praxmon Nov 18 '16 at 06:37
The length of `y` must be equal to the number of rows in `X`. – Ryan Walker Nov 18 '16 at 06:47
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/128417/discussion-between-prakhar-mohan-srivastava-and-ryan-walker). – praxmon Nov 18 '16 at 06:55
Edited. New things have come to light. – praxmon Nov 18 '16 at 06:55

RandomForest score method ValueError

1 Answers1