2

I understand that sklearn requires categorical features to be encoded to dummy variables or one-hot encoded when running the sklearn.ensemble.RandomForestRegressor method, and that XGBoost requires the same, but h2o permitted raw categorical features to be used in its h2o.estimators.random_forest.H2ORandomForestEstimator method. Since h2o4gpu's implementation of random forest is built on top of XGBoost, does this mean support for raw categorical features is not included?

S.Kumar
  • 23
  • 4

1 Answers1

1

There is no native support for categorical columns in h2o4gpu (at least yet), so you will have to one-hot encode (or label encode) your categorical columns like you do in sklearn and xgboost.

Erin LeDell
  • 8,704
  • 1
  • 19
  • 35
  • Is there a reason for this that I can read about online? The main advantage of `h2o` for me was its use of categorical features... – S.Kumar Jul 27 '18 at 14:40
  • The reason is that **h2o4gpu** was released only a few months ago. It's alpha and many features are not complete yet. – Erin LeDell Jul 30 '18 at 17:15