0

I read data in from a csv file, the first line is strings and the rest are all decimals. I had to convert the data from this file from a string to a decimal and am now trying to run a decision tree classifier over this data. I can train the data just fine but when I call DecisionTreeClassifier.score() I get the error message: "unknown is not supported"

here is my code:

cVal = KFold(len(file)-1, n_folds=10, shuffle=True);
for train_index, test_index in cVal:
    obfA_train, obfA_test = np.array(obfA)[train_index], np.array(obfA)[test_index]
    tTime_train, tTime_test = np.array(tTime)[train_index], np.array(tTime)[test_index]
    model = tree.DecisionTreeClassifier()
    model = model.fit(obfA_train.tolist(), tTime_train.tolist())
    print model.score(obfA_test.tolist(), tTime_test.tolist())

I filled obfA and tTime with these lines earlier:

tTime.append(Decimal(file[i][11].strip('"')))
obfA[i-1][j-1] = Decimal(file[i][j].strip('"'))

so obfA is a 2D array and tTime is 1D. Previously I tried removing the "tolist()" in the above code, but it did not affect the error. Here is the error report it prints:

in <module>()
---> print model.score(obfA_test.tolist(), tTime_test.tolist())

in score(self, X, y, sample_weight)
    """
    from .metrics import accuracy_score
 -->return accuracy_score(y, self.predict(X), sample_weight=sample_weight)

in accuracy_score(y_true, y_pred, normalize, sample_weight)
    # Compute accuracy for each possible representation
  ->y_type, y_true, y_pred = _check_clf_targets(y_true, y_pred)
    if y_type == 'multilabel-indicator':
        score = (y_pred != y_true).sum(axis=1) == 0

in _check_clf_targets(y_true, y_pred)
    if (y_type not in ["binary", "multiclass", "multilabel-indicator", "multilabel-sequences"]):
        -->raise ValueError("{0} is not supported".format(y_type))
    if y_type in ["binary", "multiclass"]:

ValueError: unknown is not supported

I added print statements to check the dimensions of the input parameters this is what it printed:

obfA_test.shape: (48L, 12L)
tTime_test.shape: (48L,)

I am confused why the error report shows 3 required parameters for score() but the documentation only has 2. What is the "self" parameter? Can anyone help me solve this error?

2 Answers2

2

This seems to be reminiscent of the error discussed here. The problem seems to stem from the datatype you're using to fit and score the model. Instead of Decimal when filling your input data arrays, try float. And just so I don't have an inaccurate answer -- you can't use floats/continuous values for DecisionTreeClassifiers. If you want to use floats, use a DecisionTreeRegressor. Otherwise, try using integers or strings (but that might steering away from the task that you're trying to accomplish).

As for the self question at the end, this is a syntactic idiosyncrasy of Python. When you do model.score(...), Python is sort of treating it like score(model, ...). I'm afraid I don't know much more about it than that right now, but it isn't necessary to answer your original question. Here's an answer that better addresses that particular question.

Community
  • 1
  • 1
rabbit
  • 1,476
  • 12
  • 16
  • I changed everything from Decimal to float, but it changed the error from "unknown is not supported" to "continuous is not supported" – resistancefm Apr 29 '15 at 00:08
  • It would be helpful to have a better understanding of what you want to do. In your main question, can you briefly describe obfA and tTime? I just realized that you may not be able to use floats for DecisionTreeClassifier, but you may find it more useful to use DecisionTreeRegressor, depending upon your task. – rabbit Apr 29 '15 at 00:51
1

I realized that the problem I was having was because I was trying to use a DecisionTreeClassifier to predict continuous values, when they can only be used to predict discrete values. I will have to switch to using a regression model instead.