Implementation of LIME on h2o modelling in R

Question

I want to implement LIME on a model created using h2o(deep learning) in R. For using the data in the model, I created h2oFrames and converted it back to dataframe before using it in LIME (lime function, because LIME's explain function can't recognize a h2oFrame). Here I am able to run the function

Next step is to use the explain function on test data to generate explanations. Here R throws an error for using a dataframe as well as a h2oFrame.

This is the error generated when using a dataframe:

Error in chk.H2OFrame(x) : must be an H2OFrame

This is the error generated when using a h2oframe:

Error in UseMethod("permute_cases") : 
  no applicable method for 'permute_cases' applied to an object of class "H2OFrame"

if(!require(pacman))  install.packages("pacman")
pacman::p_load(h2o, lime, data.table, e1071)

data(iris)
h2o.init( nthreads = -1 )
h2o.no_progress()

# Split up the data set
iris <- as.h2o(iris)

split <- h2o.splitFrame( iris, c(0.6, 0.2), seed = 1234 )
iris_train <- h2o.assign( split[[1]], "train" ) # 60%
iris_valid <- h2o.assign( split[[2]], "valid" ) # 20%
iris_test  <- h2o.assign( split[[3]], "test" )  # 20%


output <- 'Species'
input <- setdiff(names(iris),output)


model_dl_1 <- h2o.deeplearning(
  model_id = "dl_1", 
  training_frame = iris_train, 
  validation_frame = iris_valid,
  x = input,
  y = output,
  hidden = c(32, 32, 32),
  epochs = 10, # hopefully converges earlier...
  score_validation_samples = 10000, 
  stopping_rounds = 5,
  stopping_tolerance = 0.01
)

pred1 <- h2o.predict(model_dl_1, iris_test)
list(dimension = dim(pred1), pred1$predict)

#convert to df from h2ofdataframe

train_org<-as.data.frame(iris_train) 
#converting train h2oframe to dataframe
sapply(train_org,class) #checking the class of train_org
test_df <- as.data.frame(iris_test) 
#converting test data h2oFrame to dataframe
test_sample <- test_df[1:1,] 

#works
#lime is used to get explain on the train data
explain <- lime(train_org, model_dl_1, bin_continuous = FALSE, n_bins = 
                  5, n_permutations = 1000)


# Explain new observation
explanation <- explain(test_sample, n_labels = 1, n_features = 1)
h2o.shutdown(prompt=F)

Can anyone please help me with finding a solution or a way to use the explain function of LIME with the appropriate dataFrame

Please provide a fully-reproducible code example as well as version info about the lime and h2o R packages. — Erin LeDell, Jul 12 '17 at 18:17
You need to update the code in your post so that it's reproducible -- it can be any dataset (iris would be fine). Please see Stack Overflow guidelines about MCVE here: https://stackoverflow.com/help/mcve If I can't copy/paste your code to help you debug the code, then it's not an MCVE. — Erin LeDell, Jul 12 '17 at 18:58
@ErinLeDell, thank you for the feedback, I'll make the changes. — gattaca, Jul 12 '17 at 19:00
@ErinLeDell, I have posted the complete code. Please can you have a look. Thank you for your time — gattaca, Jul 12 '17 at 19:59
Can you clarify which of your two error messages happens with the code you posted, and which line it happens on? — Darren Cook, Aug 29 '17 at 08:48

Matt Dancho · Accepted Answer · 2017-09-17T13:07:43.490

The lime package under the hood uses two functions, predict_model() and model_type() that you need to setup for any models that are not currently supported.

For your specific example, here's what you need to do.

Step 1: Setup a generic model_type function for models of class H2OMultinomialModel. All you do here is tell lime what model type you want it to perform such as "classification" or "regression".

model_type.H2OMultinomialModel <- function(x, ...) {
    # Function tells lime() what model type we are dealing with
    # 'classification', 'regression', 'survival', 'clustering', 'multilabel', etc
    #
    # x is our h2o model

    return("classification")

}

Step 2: Setup a generic predict_model function for models of class H2OMultinomialModel. The key here is understanding that for lime to work it needs classification probabilities rather than the prediction (this took me a little while to figure out and it has to deal with an lime:::output_type(explaination) variable).

predict_model.H2OMultinomialModel <- function(x, newdata, type, ...) {
    # Function performs prediction and returns dataframe with Response
    #
    # x is h2o model
    # newdata is data frame
    # type is only setup for data frame

    pred <- h2o.predict(x, as.h2o(newdata))

    # return classification probabilities only
    return(as.data.frame(pred[,-1]))

}

Once you set these functions up properly, you can run your lime scripts.

# Lime is used to get explain on the train data
explainer <- lime(train_org, model_dl_1, bin_continuous = FALSE, n_bins = 5, n_permutations = 1000)

# Explain new observation
explanation <- explain(test_sample, explainer, n_labels = 1, n_features = 1)

Note that the `lime` package has since been updated to integrate `h2o`. You may need to download the GitHub version here: https://github.com/thomasp85/lime — Matt Dancho, Oct 27 '17 at 16:50

Implementation of LIME on h2o modelling in R

1 Answers1