I want to do two Cross Validation processes in Spark using RandomSplits like
- CV_global: by splitting data into Training Set 90% and Testing Set 10%
1.1. CV_grid: grid search on half of Training Set, i.e. 45% of data.
1.2. Fit Model: on Training set (90%) using the best settings from CV_grid.
1.3 Test Model: on Testing set (10%)
- Report Average metrics per 10-fold and global metrics.
The problem is I only find examples using CV and Grid search on the whole training set.
How can I get the parameters of the best performing model from CV_grid?
How to do CV without grid search but get stats per fold? e.g. sklearn.cross_validation.cross_val_score