I am new player in R and want to solve binary classification task.
Dataset has factor variable LABELS with 2 classes: first - 0, second - 1. The next image shows actual head of it:
TimeDate column - it's just index.
Class distribution is defined as:
print("the number of values with % in factor variable - LABELS:")
percentage <- prop.table(table(dataset$LABELS)) * 100
cbind(freq=table(dataset$LABELS), percentage=percentage)
Also I know that Slot2 column is calculated based on formula:
Slot2 = Var3 - Slot3 + Slot4
The features Var1,Var2,Var3,Var4 were selected after analysis the correlation matrix.
Before start the modeling i divided dataset to train and test parts. I tried to build Random forest Model for binary classification task used the next code:
rf2 <- randomForest(LABELS ~ Var1 + Var2 + Var3 + Var4,
data=train, ntree = 100,
mtry = 4, importance = TRUE)
print(rf2)
The result is:
Call:
randomForest(formula = LABELS ~ Var1 + Var2 + Var3 + Var4,
data = train, ntree = 100, mtry = 4, importance = TRUE)
Type of random forest: classification
Number of trees: 100
No. of variables tried at each split: 4
OOB estimate of error rate: 0.16%
Confusion matrix:
0 1 class.error
0 164957 341 0.002062941
1 280 233739 0.001196484
When I tried to do predict:
# Prediction & Confusion Matrix - train data
p1 <- predict(rf2, train, type="prob")
print("Prediction & Confusion Matrix - train data")
confusionMatrix(p1, train$LABELS)
# # Prediction & Confusion Matrix - test data
p2 <- predict(rf2, test, type="prob")
print("Prediction & Confusion Matrix - test data")
confusionMatrix(p2, test$LABELS)
I received an error in R:
[1] "Prediction & Confusion Matrix - train data"
Error: `data` and `reference` should be factors with the same levels.
Traceback:
1. confusionMatrix(p1, train$LABELS)
2. confusionMatrix.default(p1, train$LABELS)
3. stop("`data` and `reference` should be factors with the same levels.",
. call. = FALSE)
Also I have already tried to fix it by using idea from the following questions:
Error in ConfusionMatrix the data and reference factors must have the same number of levels R CARET
Error in Confusion Matrix : the data and reference factors must have the same number of levels
but it doesn't help in my case.
Could you please help me with this error?
I'll be appreciate for any ideas and comments.Thank you in advance.