I,ve a question regarding the output format of probabilities for each observation.
Current output:
0 1
2282 9.791608e-01 2.083920e-02
135 4.769759e-01 5.230241e-01
2036 9.807866e-01 1.921336e-02
Desired output: just like the example below produces
1 0
9 0.4268682 0.5731318
10 0.4268682 0.5731318
4 0.4268682 0.5731318
7 0.2590067 0.7409933
2 0.2590067 0.7409933
With the reproducible example below, I get the desired output as probabilities between 0 and 1. Though, when I run the exact same code with another bigger data set I want to run my analysis on, containing 1100 variables and 10000 observations where each cell is filled with either a 0 or a 1, so same data set bigger size, then I get the current output as shown above.
# data preparation
A <- c(1, 0, 1, 1, 0, 0, 0, 1, 0, 0)
B <- c(1,0,1,1,1,1,0,0,1,1)
C <- c(0,1,1,1,1,1,1,1,1,1)
train <- data.frame(A, B, C)
train[] <- lapply(train, as.factor)
# randomize data
train <- train[sample(nrow(train)),]
# create 10 equal folds
number_folds <- 2
folds <- cut(seq (1, nrow(train)), breaks = number_folds, labels = FALSE)
# Vectors created to store the initialized values with 0’s
accuracy_SVM <- rep(0,number_folds)
# install packages required for SVM and NB
install.packages("e1071")
library("e1071")
# Cross validation, data segmentations and running the model
for(i in 1:number_folds){
testIndexes <- which(folds == i, arr.ind = TRUE)
testData <- train[testIndexes ,]
trainData <- train[-testIndexes ,]
SVM_model <- svm(A ~ ., data = trainData, probability = T)
classification_svm <- predict(SVM_model, testData, type ="response", probability = T)
accuracy_SVM[i] <- sum(classification_svm == testData$A) / nrow (testData)
}
attr(classification_svm, "probabilities")
Sorry that I cannot provide my own data set to help you reproducing the same output, but I couldn't formulate the question any clearer. Help would be much appreciated! :)