Recovering instances' IDs after training and prediction

Question

I have a dataset with an ID column for each sample as in this example:

id score1 score2 score3
1  0.41   0.37   0.04
2  0.19   0.33   0.277
3  0.21   0.33   0.037
4  0.49   0.23   0.378
5  0.51   0.78   0.041

To fit and predict a ML classifier on this data, I have to remove the ID column from the data

X = np.array(df.drop(['id'], 1)) 
X_train, X_test = model_selection.train_test_split(X, test_size=0.2)`
clf.fit(X_train)
pred = clf.predict(X_test)

I am wondering how can I recover the ID in prediction results, so I can identify each sample if it was correctly classified or not ? because I already know the correct label of samples. Or, if there is a way to keep the ID (could be numeric or non-numeric) in the training ?

I found this related question, but I can't understand what to do because they are talking about other things like Census Estimator, etc. and I'm running a very simple Python script with numpy and scikit-learn libraries.

score 7 · Accepted Answer · answered Jun 28 '18 at 16:25

7

You can use the features of Pandas to do this. I used iris dataset and the code below is worked fine. label column is the actual labels.

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
df = pd.read_csv("ids.csv", sep=",")
clf = LogisticRegression()

X = df
y = df['label']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
X_train_data = X_train.iloc[:,1:5]
X_test_data = X_test.iloc[:,1:5]
clf.fit(X_train_data, y_train)
pred = clf.predict(X_test_data)
sub = pd.DataFrame(data=X_test)
sub['pred'] = pred
sub.head() #Shows the first few rows

The result looks like this

id   f1   f2   f3   f4   label  pred
144  6.8  3.2  5.9  2.3   2     2
68   5.8  2.7  4.1  1.0   1     1
10   4.9  3.1  1.5  0.1   0     0
137  6.3  3.4  5.6  2.4   2     2
46   4.8  3.0  1.4  0.3   0     0

answered Jun 28 '18 at 16:25

JISHAD A.V

361
1
9

Great ! thank you very much ;), I tried to accept your answer but I haven't enough reputations. – Houcine Amraoui Jun 28 '18 at 17:38
great. Please upvote then. and accept once you get enough reputations. Thanks. :) – JISHAD A.V Jun 28 '18 at 18:02
I can't even upvote it -_- another time when I earn some reputations I'll be back here and I'll both upvote and accept it :D – Houcine Amraoui Jun 28 '18 at 18:25
I see he never came back to accept...but here's another upvote. – Hatt May 21 '21 at 15:41
A wild ride, but he did accept in the end :+1 anyway, here is an upvote – karlis ssl Sep 29 '21 at 11:22
Aren't you using the IDs for fitting the model in this case? I don't see where you exclude them in the process – Aleix CC Oct 27 '22 at 09:38

Recovering instances' IDs after training and prediction

1 Answers1