1

I am running a classification model in H2O R. I would like to extract fitted model predictions for my training dataset.

Code:

train <- as.h2o(train)
test <- as.h2o(test)
y <- "class"
x <- setdiff(names(train), y)
family <- "multinomial"
nfolds <- 5 
gbm1 <- h2o.gbm(x = x, y = y, distribution = family,
            training_frame = train,
            seed = 1,
            nfolds = nfolds,
            fold_assignment = "Modulo",
            keep_cross_validation_predictions = TRUE)
h2o.getFrame(gbm1@model$cross_validation_predictions[[gbm1@allparameters$nfolds]]$name)[,2:4]
Vijayan N
  • 53
  • 2
  • 7
  • Which type of model? Please paste a sample of your code so I understand what you are trying to do. I assume you would want predictions on a test set rather than your training set...? – Erin LeDell Mar 27 '17 at 19:25
  • @ErinLeDell added the code. No, I want to get my trained model's predictions.that is gbm1's fitted predictions. – Vijayan N Mar 27 '17 at 19:59
  • Ok, I see -- you want the cross-validated predictions. Thanks for clarifying. – Erin LeDell Mar 27 '17 at 23:46

1 Answers1

4

Here is a simple example of how to extract the cross-validated predictions from a trained H2O model in R (using the Iris dataset).

library(h2o)
h2o.init(nthreads = -1)

data(iris)
train <- as.h2o(iris)
y <- "Species"
x <- setdiff(names(train), y)
family <- "multinomial"
nfolds <- 5 

gbm1 <- h2o.gbm(x = x, y = y, 
                distribution = family,
                training_frame = train,
                seed = 1,
                nfolds = nfolds,
                fold_assignment = "Modulo",
                keep_cross_validation_predictions = TRUE)

cvpreds_id <- gbm1@model$cross_validation_holdout_predictions_frame_id$name
cvpreds <- h2o.getFrame(cvpreds_id)

The cvpreds object is an H2OFrame that looks like this:

> cvpreds
  predict    setosa   versicolor    virginica
1  setosa 0.9986012 0.0008965135 0.0005022631
2  setosa 0.9985695 0.0004486762 0.0009818434
3  setosa 0.9981387 0.0004777671 0.0013835724
4  setosa 0.9985246 0.0006259377 0.0008494549
5  setosa 0.9989924 0.0005033832 0.0005042294
6  setosa 0.9981410 0.0013581692 0.0005008536

[150 rows x 4 columns] 
Erin LeDell
  • 8,704
  • 1
  • 19
  • 35