Transformation of data during cross validation

Question

I am using a H20GeneralizedLinearEstimator in h2o.ai.

I am planning to use the cross validation built-in option to get cross validated performances. Before fitting the model, I perform some transformations (scaling and translating mainly) that depend on the data I am applying the transformations to.

Ideally these transformations should be "trained" just on the train set and applied asis on the test data. Therefore, in principle, the same should be done during cross validation: at each cross validation step, the transformation should be trained on the relative train data and applied to test data.

Is it possible to do so in H2O, without having to manually implement a cross validation loop?

Thanks

score 0 · Answer 1 · answered Apr 03 '23 at 20:49

0

If you're using the H2O GLM, you don't need to do any scaling to the data because you can do that automatically by setting normalize to True. If there's other transformations you need to do for some reason, then you'd want to set up a manual CV loop, but hopefully you can just use the built-in scaling.

answered Apr 03 '23 at 20:49

Erin LeDell

8,704
1
19
35

The only option I see is called standardize. Is this what you are referring to? If it is, I am afraid I cannot use that, since all I need to do is a transaction of the numerical variables. Isn't there an option to perform cross validation, leveraging over distributed computing, with custom transformations in another way within h2o? Maybe using sklearn wrappers or autoML? – Simone Meloni Apr 03 '23 at 21:13
There's a sklearn interface for h2o, maybe you could use that? https://github.com/h2oai/h2o-tutorials/tree/master/tutorials/sklearn-integration – Erin LeDell Apr 19 '23 at 19:13

Transformation of data during cross validation

1 Answers1