1

The documentation in standardize section https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/standardize.html only includes these algorithms: Deep Learning, GLM, GAM, K-Means.

I have two questions:

  1. Does it mean that other algorithms such as Random Forest, Gradient Boosting, etc, are not standardizing (at least automatically in AutoML)?

  2. Does standardize = TRUE in Deep Learning, GLM, ..., standardize the target variable altogether, or only features?

A related question is Feature Standardize in AutoML H2O.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Matthew Son
  • 1,109
  • 8
  • 27
  • 1
    Tree-based algorithms (RF, gradient boosting etc) do not require standardization; see own answer in [Is it necessary to normalize data for XGBoost?](https://datascience.stackexchange.com/questions/60950/is-it-necessary-to-normalize-data-for-xgboost/60954#60954) in Data Science SE. – desertnaut Apr 28 '23 at 18:48

1 Answers1

2

Regarding your question 1. Correct. For algorithms that do not have the standardize parameter, the predictors are not standardized. For tree based algorithms, we are dealing with comparisons like val >= threshold to determine which side of the child nodes to go to. If we implement standardization, we will have to perform (val-mean)/standard deviation >= threshold. In choosing not to standardize will say us a lot of time during the tree traversal because we don't need to perform standardization of the predictors when we are trying to evaluate the expression val >= threshold.

Regarding question 2: When you set standardize=true, only the numerical features are standardized. The response column is not standardized.

Wendy
  • 206
  • 2