I am using naive bayes to classify my observations into 3 classes: S1, S2 and S3, depending on the value of the variable SC_3ans. However, it seems to always classify them into S1 and never in S2, which it should. As you can see in the confusion matrix, 0 observations have been classed in S2. I tried to change the size of the testing set, it hasn't changed anything. How can I fix this ?
set. Seed(2)
id <- sample(2, nrow(Data), prob = c(0.7,0.3), replace = T)
Datatrain <- Data[id==1,]
Datatest <- Data[id==2,]
library(e1071)
library(caret)
y <- Datatrain$SC_3ans_segment
x <- Datatrain[, names(Datatrain) %in% c("TYPE_CONTRACTUALISATION","WEBSERVICE_MANUEL","REGION","RENFORT","ECO","TRANCHE_ANC_2021","GAR_PRODUIT","TRANCHE_AGE","SOU_GRP_SITUATION_FAMILLE","REGIME","Type_Distribution","PTF_2022","GAR_FORMULE_GROUPE")]
Data_nb_model <- caret::train(x,y,'nb',trControl=trainControl(method='cv',number=10))
Test_model <- predict(object=Data_nb_model, newdata=Datatest)
confusionMatrix(table(Test_model, Datatest$SC_3ans_segment))
This is the output:
Confusion Matrix and Statistics
Test_model S1 S2 S3
S1 10349 1023 4913
S2 0 0 0
S3 1637 231 1492
Overall Statistics
Accuracy : 0.6027
95% CI : (0.5959, 0.6096)
No Information Rate : 0.6101
P-Value [Acc > NIR] : 0.9833
Kappa : 0.094
Mcnemar's Test P-Value : <2e-16
Statistics by Class:
Class: S1 Class: S2 Class: S3
Sensitivity 0.8634 0.00000 0.23294
Specificity 0.2250 1.00000 0.85891
Pos Pred Value 0.6355 NaN 0.44405
Neg Pred Value 0.5128 0.93617 0.69831
Prevalence 0.6101 0.06383 0.32604
Detection Rate 0.5268 0.00000 0.07595
Detection Prevalence 0.8290 0.00000 0.17104
Balanced Accuracy 0.5442 0.50000 0.54593