I have tried to make a random forest with some help on youtube. I am really new to ML concept so i tried to let everything as default.
Firstly i gave my training set 2 factors which has some variance by themselves.
The main problem is that i get 0% accuracy like everything is predicted as 0 because is the major value (70% of 0 30% of 1)
The code is here:
pedes.10 <- cas.10[which(cas.10$Casualty_Type == "0"),]
pedes.10$Age_Band_of_Casualty <- as.factor(pedes.10$Age_Band_of_Casualty)
# to make Injury only on 1 and 2 value
for (i in 1:nrow(pedes.10)) {
x <- ifelse(pedes.10$Casualty_Severity != "3", 1, 0)
pedes.10$Injury <- x
}
#Starting the random forest
rf.train.1 <- pedes.10[, c("Age_Band_of_Casualty", "Sex_of_Casualty")]
rf.label <- as.factor(pedes.10$Injury)
set.seed(9299)
rf.1 <- randomForest(x = rf.train.1, y = rf.label, importance = FALSE, ntree = 3000)
rf.1
varImpPlot(rf.1)
I get 0% accuracy on the first case and 100% on the other. I do understand that i made something completely wrong but i do not know what to do...
Data set is here: (Casualties 2010) --> https://data.gov.uk/dataset/cb7ae6f0-4be6-4935-9277-47e5ce24a11f/road-safety-data
Thanks for help.
Letting the image here