-2

I think, I have a datacasting problem using H2O platform in R.

this is the error:

Error: water.exceptions.H2OModelBuilderIllegalArgumentException: Illegal argument(s) for GBM model: GBM_model_R_1568616391145_4. Details: ERRR on field: _validation_frame: Test/Validation dataset has a categorical response column 'gold' with no levels in common with the model

and this is the code:

library(h2o)
kd_h2o = h2o.init(nthreads = -1)
data = readxl::read_excel("C:\\Users\\frzd\\Desktop\\mtx.xlsx")
data_order <- data[order(data$gold),]
data_order$gold=h2o.asfactor(data_order$gold)
Split_ts = .2
Split_vl = .1
indx <- 1:round(length(data$gold)*Split_ts)
ts <- max(indx)
ts <- round(indx*length(data$gold)/ts)
test = as.h2o(data_order[ts,])
train = data_order[-ts,]
indx <- 1:round(length(train$gold)*Split_vl)
ts <- max(indx)
ts <- round(indx*length(train$gold)/ts)
valid = as.h2o(train[ts,])
train = as.h2o(train[-ts,])

fit <- h2o.gbm(y = 15, 
              training_frame = train, 
              validation_frame=valid,
                    # cvControl = list(V = 5),
               )

could you guys help me out? :)

kath
  • 7,624
  • 17
  • 32
felegant
  • 13
  • 4
  • Hi Farzad. Welcome to So. That is not easy to help you without a minimum reproducible [example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Majid Sep 16 '19 at 07:33
  • It will be difficult to help you without any information about your data. However, I think you messed up the train/validation/test split by first ordering the data by the gold column. The levels (as the error says) are not the same for the training and the validation set. – kath Sep 16 '19 at 07:38

1 Answers1

0

I figured it out

it's because of the wrong use of the data frames. here is the corrected code:

# initializing the H2O service via internet
h2o.init(nthreads = -1)

# data preperation
data = readxl::read_excel("C:\\Users\\frzd\\Desktop\\mtx.xlsx")
data_order <- data[order(data$gold),]
data_order=h2o.asfactor(data_order)

# data split
Split_ts = .2
Split_vl = .1
indx <- 1:round(length(data$gold)*Split_ts)
ts <- max(indx)
ts <- round(indx*length(data$gold)/ts)
test = as.h2o(data_order[ts,])
train = data_order[-ts,]

indx <- 1:round(length(train$gold)*Split_vl)
ts <- max(indx)
ts <- round(indx*length(train$gold)/ts)
valid = as.h2o(train[ts,])
train = as.h2o(train[-ts,])

# perform fitting
fit <- h2o.gbm(y = 15, 
               distribution= "gaussian",
              training_frame = train, 
              validation_frame=valid
                    # cvControl = list(V = 5),
               )
felegant
  • 13
  • 4