0

I have tried to make a random forest with some help on youtube. I am really new to ML concept so i tried to let everything as default.

Firstly i gave my training set 2 factors which has some variance by themselves.

The main problem is that i get 0% accuracy like everything is predicted as 0 because is the major value (70% of 0 30% of 1)

The code is here:

pedes.10 <- cas.10[which(cas.10$Casualty_Type == "0"),]

pedes.10$Age_Band_of_Casualty <- as.factor(pedes.10$Age_Band_of_Casualty)

# to make Injury only on 1 and 2 value
for (i in 1:nrow(pedes.10)) {
  x <- ifelse(pedes.10$Casualty_Severity != "3", 1, 0)
  pedes.10$Injury <- x
}

#Starting the random forest
rf.train.1 <- pedes.10[, c("Age_Band_of_Casualty", "Sex_of_Casualty")]
rf.label <- as.factor(pedes.10$Injury)

set.seed(9299)

rf.1 <- randomForest(x = rf.train.1, y = rf.label, importance = FALSE, ntree = 3000)
rf.1
varImpPlot(rf.1)

I get 0% accuracy on the first case and 100% on the other. I do understand that i made something completely wrong but i do not know what to do...

Data set is here: (Casualties 2010) --> https://data.gov.uk/dataset/cb7ae6f0-4be6-4935-9277-47e5ce24a11f/road-safety-data

Thanks for help.

Letting the image here

letting the image here here

camille
  • 16,432
  • 18
  • 38
  • 60
ThomasTas
  • 1
  • 5
  • Hi! You don't need the for loop for the `ifelse()`. It's vectorized, try the same thing without the loop, look it up here if you get stuck. – RLave Apr 01 '19 at 06:56
  • Also post a reproducible example, read here: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. `dput(pedes.10)` would be a start. – RLave Apr 01 '19 at 06:58

0 Answers0