1

In sklearn there is a function sklearn.metrics.r2_score(y_true, y_pred) where I can give it two arrays and it calculates r^2. Is there something similar in R? I've found some functions but they are only for GLMs. I have a test set and test predictions from KNN regression that I want to calculate r^2 for. Am I going to have to hand-code this?

wordsforthewise
  • 13,746
  • 5
  • 87
  • 117
  • Here is a related question: https://stackoverflow.com/q/40901445/4549682 One answer was this: `rsq <- function (x, y) cor(x, y) ^ 2` but some assumptions must be true for that to hold – wordsforthewise Jan 18 '18 at 06:12
  • Probably overkill but check package [hydroGOF](https://cran.r-project.org/web/packages/hydroGOF/hydroGOF.pdf) – Tung Jan 18 '18 at 06:33
  • 1
    check out `caret::postResample` – missuse Jan 18 '18 at 06:38
  • postResample is the answer, thanks. Why is it called 'resample' though, I don't think it's actually resampling, is it? https://en.wikipedia.org/wiki/Resampling_(statistics) – wordsforthewise Jan 18 '18 at 19:01
  • also, postResample is using the correlation ^2 approximation, not the actual equation for r^2 – wordsforthewise Jan 19 '18 at 00:56

1 Answers1

1

It is not something obvious, but the caret package has a function postResample() that will calculate "A vector of performance estimates" according to the documentation (really helpful documentation). The "performance estimates" are

  • RMSE
  • Rsquared
  • mean absolute error (MAE)

and have to be accessed from the vector like this

library(caret)
vect1 <- c(1, 2, 3)
vect2 <- c(3, 2, 2)
res <- caret::postResample(vect1, vect2)
rsq <- res[2]

However, this is using the correlation squared approximation for r-squared. Why they didn't just use the conventional 1-SSE/SST is beyond me.

The way to implement the normal coefficient of determination equation is:

preds <- c(1, 2, 3)
actual <- c(2, 2, 4)
rss <- sum((preds - actual) ^ 2)
tss <- sum((actual - mean(actual)) ^ 2)
rsq <- 1 - rss/tss
wordsforthewise
  • 13,746
  • 5
  • 87
  • 117