I have a dataset that has a unique identifier and other features. It looks like this
ID LenA TypeA LenB TypeB Diff Score Response 123-456 51 M 101 L 50 0.2 0 234-567 46 S 49 S 3 0.9 1 345-678 87 M 70 M 17 0.7 0
I split it up into training and test data. I am trying to classify test data into two classes from a classifier trained on training data. I want the identifier in the training and testing dataset so I can map the predictions back to the IDs.
Is there a way that I can assign the identifier column as a ID or non-predictor like we can do in Azure ML Studio or SAS?
I am using the DecisionTreeClassifier
from Scikit-Learn. This is the code I have for the classifier.
from sklearn import tree
clf = tree.DecisionTreeClassifier()
clf = clf.fit(traindata, trainlabels)
If I just include the ID into the traindata
, the code throws an error:
ValueError: invalid literal for float(): 123-456