I have some questions about h2o distributed random forest model

Question

According to H2O docs in FAQ of the DRF section, this note is mentioned on the "How does the algorithm handle missing values during training?" FAQ:

Note: Unlike in GLM, in DRF numerical values are handled the same way as categorical values. Missing values are not imputed with the mean, as is done by default in GLM.

I use a DRF Algorithm to solve a regression problem, but when I saw this note, I felt strange. If I convert all numerical value to categorical value to solve regression problem, I think that it is nonsense.

Here is My question.

Do I need to convert all numerical values to categorical values to use DRF algorithm?

or

Do I not need to convert all numerical values to categorical values to use DRF algorithm?

Thank you to read my question.

If the two types of values are *handled* the same say, there is no need to convert the values either way. If you feel that is incorrect for your problem, you may need another classifier. — , Apr 18 '18 at 09:23
You don't give the full quote in the documentation. The section in the FAQ is "missing values during training", and the full note reads "Note: Unlike in GLM, in DRF numerical values are handled the same way as categorical values. Missing values are not imputed with the mean, as is done by default in GLM.". It's about how DRF handles missing values, not about values in general. — , Apr 18 '18 at 09:26
Thank you to comment my question. According to your comment, I don't need to convert numeric value to categorical value. Is it correct? — youngho, Apr 18 '18 at 09:35

score 1 · Accepted Answer · answered Apr 18 '18 at 10:35

No, H2O does not require you to convert all numerical values to categorical values.

If you want to view how trained H2O DRF models treat the different input columns, follow the instructions below for how to view a MOJO.

http://docs.h2o.ai/h2o/latest-stable/h2o-genmodel/javadoc/overview-summary.html#viewing-a-mojo

Note in the picture below that numerical columns are treated with a "less than" value comparison, and categorical columns are treated by sending some of the levels to the left child and some to the right child.

I have some questions about h2o distributed random forest model

1 Answers1

Linked