I have a dataset with an ID column for each sample as in this example:
id score1 score2 score3
1 0.41 0.37 0.04
2 0.19 0.33 0.277
3 0.21 0.33 0.037
4 0.49 0.23 0.378
5 0.51 0.78 0.041
To fit and predict a ML classifier on this data, I have to remove the ID column from the data
X = np.array(df.drop(['id'], 1))
X_train, X_test = model_selection.train_test_split(X, test_size=0.2)`
clf.fit(X_train)
pred = clf.predict(X_test)
I am wondering how can I recover the ID in prediction results, so I can identify each sample if it was correctly classified or not ? because I already know the correct label of samples. Or, if there is a way to keep the ID (could be numeric or non-numeric) in the training ?
I found this related question, but I can't understand what to do because they are talking about other things like Census Estimator, etc. and I'm running a very simple Python script with numpy and scikit-learn libraries.