3

I am working on training a model with scikit-learn where I have an ID column in my dataset. I remove the ID column when I train the model.But with the test dataset,I need to map it back to the ID column after I do the prediction.

What is the best way to do this? We can set a non predictor column when building a model in scikit-learn? Also, what about the other ML tools like TensorFlow,Spark ML in general. Do they support this feature?

I found this post on stackoverflow but was looking out for other options.

Gayatri
  • 2,197
  • 4
  • 23
  • 35

1 Answers1

2

I assume you store your data (X) in a pd.DataFrame. If that is the case, simply extract the values into a numpy ndarray. The corresponding rows will have the same order. A scikit-learn stylized example:

output = pd.Series(data=some_model.predict(X.values), index=X.index) 
Jan K
  • 4,040
  • 1
  • 15
  • 16
  • And which algo will let us run `.predict()` on an X matrix wider by 1 column than X used for training? Or did we train on the ID column as well? – mirekphd Jan 26 '22 at 11:55