0

I'm doing a homework assignment where I'm asked to use the bootstrap on a support vector machine to estimate the class probability. I've managed that. Next, I am asked to use these probabilities and the true test set labels to plot an ROC curve for this SVM model (using the packages e1071 and ROCR). What I struggle with is how to use these probabilities to construct a ROCR::prediction object, which I will need to construct an ROCR::performance object, which I will need to plot the ROC curve.

I feel like I'm really stuck. Will I need to use these bootstrapped class probabilities to create a new SVM? If so, how? If not, how do I get from these class probabilities to an ROC curve?

A reproducible example:

set.seed(123)
library(e1071)
library(ROCR)
library(purrr)


### make some data

category_labels <- sample(c(-1, 1), 1000))
predictor1 <- rnorm(1000, 0, 0.1)
predictor2 <- rnorm(1000, 0, 0.1)

my_df <- as.data.frame(cbind(category_labels, predictor1, predictor2))

### 50/50 training/testing split 

train <- sample(nrow(my_df), 500)
df_train <- my_df[train,]
df_test <- my_df[-train,]

### make 200 bootstrap datasets

df_train_boot <- replicate(200, df_train[sample(500, 500, T),], simplify = F)

### make helper function for bootstrap

calculate_class_prob <- function(x){
  tmp_fit <- svm(category_labels ~ ., data = x, kernel = "radial", cost = 0.1)
  tmp_pred <- predict(tmp_fit, newdata = df_test)
  return(tmp_pred)
}

### Run bootstrap

bootstrap_class_prob <- map_dfc(.x = df_train_boot, .f = calculate_class_prob)

### Get class probability

minusones <- sum(unlist(lapply(lapply(bootstrap_class_prob, table), "[[", 1)))/200/NROW(bootstrap_class_prob)
ones <- sum(unlist(lapply(lapply(bootstrap_class_prob), "[[", 2)))/200/NROW(bootstrap_class_prob)
Tea Tree
  • 882
  • 11
  • 26
  • You should just be able to use the `prediction()` function passing the probabilities for the `predictions=` parameter and the true values as the `labels=` parameter. It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input that can be used to test and verify possible solutions. – MrFlick Dec 14 '20 at 19:13
  • I'm struggling to create a reproducible example without giving away the solution to the entire homework problem. I'll give it a try. – Tea Tree Dec 14 '20 at 20:09
  • I just received an answer from a class mate. The goal is to get the average predicted class probabilities per test observation, not the grand probability. This resolves my problem. – Tea Tree Dec 14 '20 at 20:53

0 Answers0