H2O Deep Learning R

Question

H2O Deep Learning is running regression by default even though I have ensured that the target variable is a factor (with only two levels). Any leads on how to resolve this ?

Below is the code :

dnn_mod <- 
  h2o.deeplearning(x = 2:321,  # column numbers for predictors
                   y = 322,   # column number for label
                   training_frame = sdcs_data, # data in H2O format
                   activation = "TanhWithDropout", # or 'Tanh'
                   input_dropout_ratio = 0.2, # % of inputs dropout
                   hidden_dropout_ratios = c(0.3,0.3,0.3), # % for nodes dropout
                   balance_classes = FALSE, 
                   hidden = c(150,150,150),
                   epochs = 500,
                   #standardize = TRUE,
                   epsilon = 1.0e-5,
                   loss = "CrossEntropy",
                   stopping_rounds = 50,
                   stopping_metric = "AUC")
                   #classification = TRUE)

Defaults to running a regression model rather than classification. Parameters like the CrosseEntropy loss don't make sense in which case and throw an error. — Vibhor Kalra, Apr 05 '16 at 22:41
I don't think that is the point. The problem is not the error, its that H2O doesn't run classification. — Vibhor Kalra, Apr 06 '16 at 07:28
Some data and some more code would be nice , aka a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) — phiver, Apr 06 '16 at 08:51

score 3 · Answer 1 · answered Apr 06 '16 at 00:27

3

If you want to run classification, then your response variable must be encoded as a "factor" (aka "enum") type. See this R code example from the H2O Deep Learning booklet. This is the case for all H2O algorithms.

answered Apr 06 '16 at 00:27

Erin LeDell

8,704
1
19
35

Per my question, I have already ensured that the response is encoded as a "factor". – Vibhor Kalra Apr 06 '16 at 07:27
Are you sure that column 322 is the response column? Please paste the output of `h2o.describe(sdcs_data[,322])` or `h2o.getTypes(data)[[322]]`. It should say "enum". – Erin LeDell Apr 07 '16 at 19:36
Additionally, you can explictly set `distribution = "bernoulli"` and if you have mistakenly not converted your response to a factor, then you will see a message that includes the following: `"bernoulli distribution is not allowed for regression."` – Erin LeDell Apr 07 '16 at 19:37
Lastly, if you are seeing the error, `"For CrossEntropy loss, the response must be categorical."`, then that means that you have not converted the response to a factor. – Erin LeDell Apr 07 '16 at 19:39

H2O Deep Learning R

1 Answers1