I read data in from a csv file, the first line is strings and the rest are all decimals. I had to convert the data from this file from a string to a decimal and am now trying to run a decision tree classifier over this data. I can train the data just fine but when I call DecisionTreeClassifier.score() I get the error message: "unknown is not supported"
here is my code:
cVal = KFold(len(file)-1, n_folds=10, shuffle=True);
for train_index, test_index in cVal:
obfA_train, obfA_test = np.array(obfA)[train_index], np.array(obfA)[test_index]
tTime_train, tTime_test = np.array(tTime)[train_index], np.array(tTime)[test_index]
model = tree.DecisionTreeClassifier()
model = model.fit(obfA_train.tolist(), tTime_train.tolist())
print model.score(obfA_test.tolist(), tTime_test.tolist())
I filled obfA and tTime with these lines earlier:
tTime.append(Decimal(file[i][11].strip('"')))
obfA[i-1][j-1] = Decimal(file[i][j].strip('"'))
so obfA is a 2D array and tTime is 1D. Previously I tried removing the "tolist()" in the above code, but it did not affect the error. Here is the error report it prints:
in <module>()
---> print model.score(obfA_test.tolist(), tTime_test.tolist())
in score(self, X, y, sample_weight)
"""
from .metrics import accuracy_score
-->return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
in accuracy_score(y_true, y_pred, normalize, sample_weight)
# Compute accuracy for each possible representation
->y_type, y_true, y_pred = _check_clf_targets(y_true, y_pred)
if y_type == 'multilabel-indicator':
score = (y_pred != y_true).sum(axis=1) == 0
in _check_clf_targets(y_true, y_pred)
if (y_type not in ["binary", "multiclass", "multilabel-indicator", "multilabel-sequences"]):
-->raise ValueError("{0} is not supported".format(y_type))
if y_type in ["binary", "multiclass"]:
ValueError: unknown is not supported
I added print statements to check the dimensions of the input parameters this is what it printed:
obfA_test.shape: (48L, 12L)
tTime_test.shape: (48L,)
I am confused why the error report shows 3 required parameters for score() but the documentation only has 2. What is the "self" parameter? Can anyone help me solve this error?