I've built a custom model with caret
using vglm()
from VGAM
. It works fine with simple effects but when I try to add interactions it fails with a object 'x1:x2' not found
error message, where x1
and x2
are predictor variables that I've entered into the model as an interaction. The problem has to do with the predictions, and unless I'm mistaken it seems to occur because either predict.train
or predictvglm
tries to use x1:x2
to predict classes.
I've provided a working example below.
# Set up data
set.seed(123)
n <- 100
x1 <- rnorm(n, 175, 7)
x2 <- rnorm(n, 30, 8)
cont <- 0.5 * x1 - 0.3 * x2 + 10 + rnorm(n, 0, 6)
y <- cut(cont, breaks = quantile(cont), include.lowest = TRUE,
labels = c("A", "B", "C", "D"), ordered = TRUE)
d <- data.frame(x1, x2, y)
# My custom caret function
vglmTrain <- list(
label = "VGAM prop odds",
library = "VGAM",
loop = NULL,
type = "Classification",
parameters = data.frame(parameter = "parameter",
class = "character",
label = "parameter"),
grid = function(x, y,
len = NULL, search = "grid") data.frame(parameter = "none"),
fit = function(x, y, wts, param, lev, last, classProbs, ...) {
dat <- if(is.data.frame(x)) x else as.data.frame(x)
dat$.outcome <- y
if(!is.null(wts))
{
out <- vglm(.outcome ~ ., propodds, data = dat, weights = wts, ...)
} else {
out <- vglm(.outcome ~ ., propodds, data = dat, ...)
}
out
},
predict = function(modelFit, newdata, preProc = NULL, submodels = NULL) {
probs <- predict(modelFit, data.frame(newdata), type = "response")
predClass <- function (x) {
n <- colnames(x)
factor(as.vector(apply(x, 1, which.max)),
levels = 1:length(n),
labels = n)
}
predClass(probs)
},
prob = function(modelFit, newdata, preProc = NULL, submodels = NULL)
predict(modelFit, data.frame(newdata), type = "response"),
predictors = function(x, ...) names(attributes(terms(x))$dataClasses[-1]),
levels = function(x) x@misc$ynames,
sort = function(x) x)
Now if I try to use the function, it exits with an error if I supply the formula with an interaction.
# Load caret
library(caret)
ctrl <- trainControl(method = "cv", number = 2, verboseIter = T)
# A model with no interactions - works
f1 <- train(y ~ x1 + x2, data = d,
method = vglmTrain,
trControl = ctrl)
# A model with interactions - fails
f2 <- train(y ~ x1*x2, data = d,
method = vglmTrain,
trControl = ctrl)
Error in train.default(x, y, weights = w, ...) : Stopping
In addition: Warning messages:
1: In eval(expr, envir, enclos) :
predictions failed for Fold1: parameter=none Error in eval(expr, envir, enclos) : object 'x1:x2' not found
2: In eval(expr, envir, enclos) :
predictions failed for Fold2: parameter=none Error in eval(expr, envir, enclos) : object 'x1:x2' not found
3: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.
Here's my sessionInfo():
> sessionInfo()
R version 3.2.4 (2016-03-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] splines stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] VGAM_1.0-0 caret_6.0-64 ggplot2_2.1.0 lattice_0.20-33
loaded via a namespace (and not attached):
[1] Rcpp_0.12.3 magrittr_1.5 MASS_7.3-45 munsell_0.4.3 colorspace_1.2-6 foreach_1.4.3 minqa_1.2.4 stringr_1.0.0 car_2.1-1
[10] plyr_1.8.3 tools_3.2.4 nnet_7.3-12 pbkrtest_0.4-6 parallel_3.2.4 grid_3.2.4 gtable_0.2.0 nlme_3.1-125 mgcv_1.8-12
[19] quantreg_5.21 e1071_1.6-7 class_7.3-14 MatrixModels_0.4-1 iterators_1.0.8 lme4_1.1-11 Matrix_1.2-3 nloptr_1.0.4 reshape2_1.4.1
[28] codetools_0.2-14 stringi_1.0-1 compiler_3.2.4 scales_0.4.0 SparseM_1.7
Does anybody have any idea how to fix this?