1

Im using Base R to test this model:

probabilities <- predict(theModel, newdata = dataToModel2 , type = "response")   
dataToModel2$predictions <- ifelse(probabilities >= .5, "True", "False")

and then when I try to test for accuracy using this code:

 accuracy <- sum(dataToModel2$predictions == dataToModel2$incomeNum)/dim(dataToModel2)[1]

I get a 0 rather than a number depicting how accurate my model is. Why is this and how do you fix such an error?

I hope this can help. Data for the original model:

dataToModel <- structure(
  list(
    sex = c("Male", "Male", "Male", "Male", "Female"),
    marital.status = c("Never-married", "married", "pMarried",
                       "married", "married"),
    race = c("White", "White", "White", "Black",
             "Black"),
    education = c(
      "University",
      "University",
      "less-than-Uni",
      "less-than-Uni",
      "University"
    ),
    incomeNum = c(FALSE, FALSE, FALSE,
                  FALSE, FALSE)
  ),
  row.names = c(NA, 5L),
  class = "data.frame"
)

And data for predictions:

dataToModel2 <- structure(
  list(
    sex = c("Male", "Male", "Male", "Male", "Male"),
    marital.status = c(
      "Never-married",
      "married",
      "married",
      "married",
      "Never-married"
    ),
    race = c("Black", "White", "White",
             "Black", "White"),
    education = c(
      "less-than-Uni",
      "less-than-Uni",
      "University",
      "less-than-Uni",
      "less-than-Uni"
    ),
    incomeNum = c(FALSE,
                  FALSE, FALSE, FALSE, FALSE),
    predictions = c("False", "False",
                    "True", "False", "False")
  ),
  row.names = c(1L, 2L, 3L, 4L, 6L),
  class = "data.frame"
)
Shawn Hemelstrand
  • 2,676
  • 4
  • 17
  • 30
  • because you might be comparing things that are not equivalent. Consider having a confusion matrix instead, and then accuracy will be the proportion of the diagonal elements over everything else – Onyambu Oct 20 '22 at 22:36
  • Impossible to say without [a reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) which includes the data and all the relevant code. – neilfws Oct 20 '22 at 22:36

1 Answers1

0

Just a guess because this is incomplete, but your logic looks flawed in here:

accuracy <- sum(dataToModel2$predictions == dataToModel2$incomeNum)/dim(dataToModel2)[1]

You assigned a true or false value for the prediction, and you have a variable that suggests there is a class in there, maybe a 0/1.

If you use the actual actual prediction instead of the probability, then it will likely be one of the classes you fed into the model and that equivalency might give you a resulting accuracy.

EDIT: After seeing your data added, the reason is clear. It is subtle, but meaningful.

In R (and all languages which I have used save SQL) case matters! So if you declare a variable pickles and later call it with Pickles R does not see them as the same thing. This is true of column names, category levels and indexes as well.

Although not the same in construction, this also applies to boolean objects. In R, the idea of true is represented by (unquoted) TRUE you have created a quoted 'True' this is a title cased string where as TRUE is a boolean logical entity.

Thus when you do your comparison, they are not equal.

Try running "True"==TRUE in R you will see FALSE. So, if you rewrite the code:

dataToModel2$predictions <- ifelse(probabilities >= .5, "True", "False")

to be this:

dataToModel2$predictions <- ifelse(probabilities >= .5, TRUE, FALSE)

You should be good to go!

sconfluentus
  • 4,693
  • 1
  • 21
  • 40