1

I have the following code segment in R, where I try to train a model based on SVM:

library(base)
library(caret)
library(iml)
library(tidyverse)

dataset <- read_csv("https://gist.githubusercontent.com/dmpe/bfe07a29c7fc1e3a70d0522956d8e4a9/raw/7ea71f7432302bb78e58348fede926142ade6992/pima-indians-diabetes.csv", col_names=FALSE)
X = dataset[, 1:8]
Y = as.factor(ifelse(dataset$X9 == 1, 'diabetes', 'nondiabetes'))

set.seed(88)

nfolds <- 3
cvIndex <- createFolds(Y, nfolds, returnTrain = T)

fit.control <- trainControl(method="cv",
                            index=cvIndex,
                            number=nfolds,
                            classProbs=TRUE,
                            savePredictions=TRUE,
                            verboseIter=TRUE,
                            summaryFunction=twoClassSummary,
                            allowParallel=FALSE)

model <- caret::train(X, Y,
                      method = "svmLinear",
                      trControl = fit.control,
                      preProcess=c("center","scale"),
                      tuneLength=10)

pred <- Predictor$new(model$finalMode, data=dataset)
pdp <- FeatureEffect$new(pred, "X1", method="pdp")

However, the predictor throws and error shown on the title. Any ideas why this is happening and how to overcome it?

pat-s
  • 5,992
  • 1
  • 32
  • 60

1 Answers1

1

You don't need to select the model$finalModel (do you have a typo in that line? You have $finalMode - no l). You run a line such as:

pred <- predict(model, newdata, type = "prob")

and Caret will automatically employ the model with best score. The output will give you complementary probabilities for diabetes (column 1) or not (column 2) if you select type = "prob". If you want a specific model from the caret 'model' object, then I believe you can pick it out (from your previous folds question) - but I've never done it and am not sure how.

For your partial dependency plot, well, I use the pdp package, so something like this should work:

library(pdp)
varname = 'X1' # Change this to whatever your first variable is called, or subsequently variables you are interested in.
partial(model, pred.var = varname, 
        train = X, chull=T, prob = T, progress = "text")

where X is the data you trained your model on (X in your case I think?)

Jon
  • 445
  • 3
  • 15
  • I would just comment that R is a functional language; so you call a function (e.g. partial) then add your function parameters. In my admittedly limited experience I find that Python is different in that often you specify an object then add " .method " - that looks a bit like the syntax you're using in your pdp and model calls. – Jon Jun 28 '19 at 07:47