2

Our team is currently using CatBoost to develop credit scoring models, and our current process is to...

  1. Sort the data chronologically for out-of-time sampling, and split it into train, valid, and test sets
  2. Perform feature engineering
  3. Perform feature selection and hyperparameter tuning (mainly learning rate) on train, using valid as an eval set for early stopping
  4. Perform hyperparameter tuning on the combination of train and valid, using test as an eval set for early stopping
  5. Evaluate the results of Step #4 using standard metics (RMSE, ROC AUC, etc.)

However, I am concerned that we may be overfitting to the test set in Step #4.

In Step #4, should we just be refitting the model on train and valid without tuning (i.e., using the selected features and hyperparameters from Step #3)?

The motivation for having a Step #4 at all is to train the models on more recent data due to our out-of-time sampling scheme.

swritchie
  • 31
  • 4
  • I thought the test set is never be used to perform any tuning on, only to determine who is the winner after all optimization is done. But then I'm not really an expert. Good question. A bit less focused on a concrete programming problem but I think still ontopic. Perhaps [CrossValidated StackExchange](https://stats.stackexchange.com/) would be an even better fit for this question. – NoDataDumpNoContribution Oct 20 '21 at 18:30
  • I think the test set must no be used for tuning hyperparameters – Patricio Loncomilla Oct 20 '21 at 18:31
  • @Trilarion, that is my thought as well, and thanks for the recommendation--will repost there too. – swritchie Oct 20 '21 at 18:45
  • @swritchie Please only repost if you are not satisfied with the answer here and if you can add to the question why you are not satisfied with it (like adding details as in the comment to this answer). Otherwise they may answer exactly the same. – NoDataDumpNoContribution Oct 20 '21 at 18:53

1 Answers1

2

Step #4 falls outside of the best practices for machine learning.

When you create the test set, you need to set it aside and only use it at the end to evaluate how successful your model(s) are at making predictions. Do not use the test set to inform hyperparameter tuning! If you do, you will overfit your data.

Try using cross-validation instead:

enter image description here

NolantheNerd
  • 338
  • 3
  • 10
  • That is my thought as well, and while I do not think an ordinary StratifiedKFold cross validation scheme will work for us due to the out-of-time sampling, I think something like TimeSeriesSplit might. I just have to get my head wrapped around how to do it. sklearn and catboost do not play very nicely together – swritchie Oct 20 '21 at 18:51