Machine learning parameter tuning using partitioned benchmark dataset

Question

I know this will be very basic, however I'm really confused and I would like to understand parameter tuning better.

I'm working on a benchmark dataset that is already partitioned to three splits training, development, and testing and I would like to tune my classifier parameters using GridSearchCV from sklearn.

What is the correct partition to tune the parameter? is it the development or the training?

I've seen researchers in the literature mentioning that they "tuned the parameters using GridSearchCV on the development split" another example is found here;

Do they mean they trained on the training split then tested on the development split? or do ML practitioners usually mean they perform the GridSearchCV entirely on the development split?

I'd really appreciate a clarification. Thanks,

score 1 · Answer 1 · answered Sep 29 '18 at 10:42

1

Usually in a 3-way split you train a model using a training set, then you validate it on a development (which is also called validation set) set to tune hyperpameters and then after all the tuning is complete you perform a final evaluation of a model on an unseen before testing set (also known as evaluation set).

In a two-way split you just have a train set and a test set, so you perform tuning/evaluation on the same test set.

answered Sep 29 '18 at 10:42

hellpanderr

5,581
3
33
43

Could you explain how I can use sklearn GridSearchCV if I have two splits (train and dev)? thanks – user3446905 Sep 29 '18 at 11:13
@user3446905 You can either concatenate them and send as one set to `gridsearchcv.fit()` to allow it do the splitting for you, or you can force it using predefined split as described here https://stackoverflow.com/questions/48390601/explicitly-specifying-test-train-sets-in-gridsearchcv – hellpanderr Sep 29 '18 at 11:26

Machine learning parameter tuning using partitioned benchmark dataset

1 Answers1