Training model with ID column

Question

I am working on training a model with scikit-learn where I have an ID column in my dataset. I remove the ID column when I train the model.But with the test dataset,I need to map it back to the ID column after I do the prediction.

What is the best way to do this? We can set a non predictor column when building a model in scikit-learn? Also, what about the other ML tools like TensorFlow,Spark ML in general. Do they support this feature?

I found this post on stackoverflow but was looking out for other options.

Just dont send the ID column while predicting. The output will match the input. — Vivek Kumar, Apr 18 '18 at 07:12

score 2 · Answer 1 · answered Apr 18 '18 at 11:39

2

I assume you store your data (X) in a pd.DataFrame. If that is the case, simply extract the values into a numpy ndarray. The corresponding rows will have the same order. A scikit-learn stylized example:

output = pd.Series(data=some_model.predict(X.values), index=X.index)

answered Apr 18 '18 at 11:39

Jan K

4,040
1
15
16

And which algo will let us run `.predict()` on an X matrix wider by 1 column than X used for training? Or did we train on the ID column as well? – mirekphd Jan 26 '22 at 11:55

Training model with ID column

1 Answers1