In sklearn there is a function sklearn.metrics.r2_score(y_true, y_pred)
where I can give it two arrays and it calculates r^2. Is there something similar in R? I've found some functions but they are only for GLMs. I have a test set and test predictions from KNN regression that I want to calculate r^2 for. Am I going to have to hand-code this?
Asked
Active
Viewed 894 times
1

wordsforthewise
- 13,746
- 5
- 87
- 117
-
Here is a related question: https://stackoverflow.com/q/40901445/4549682 One answer was this: `rsq <- function (x, y) cor(x, y) ^ 2` but some assumptions must be true for that to hold – wordsforthewise Jan 18 '18 at 06:12
-
Probably overkill but check package [hydroGOF](https://cran.r-project.org/web/packages/hydroGOF/hydroGOF.pdf) – Tung Jan 18 '18 at 06:33
-
1check out `caret::postResample` – missuse Jan 18 '18 at 06:38
-
postResample is the answer, thanks. Why is it called 'resample' though, I don't think it's actually resampling, is it? https://en.wikipedia.org/wiki/Resampling_(statistics) – wordsforthewise Jan 18 '18 at 19:01
-
also, postResample is using the correlation ^2 approximation, not the actual equation for r^2 – wordsforthewise Jan 19 '18 at 00:56
1 Answers
1
It is not something obvious, but the caret
package has a function postResample()
that will calculate "A vector of performance estimates" according to the documentation (really helpful documentation). The "performance estimates" are
- RMSE
- Rsquared
- mean absolute error (MAE)
and have to be accessed from the vector like this
library(caret)
vect1 <- c(1, 2, 3)
vect2 <- c(3, 2, 2)
res <- caret::postResample(vect1, vect2)
rsq <- res[2]
However, this is using the correlation squared approximation for r-squared. Why they didn't just use the conventional 1-SSE/SST is beyond me.
The way to implement the normal coefficient of determination equation is:
preds <- c(1, 2, 3)
actual <- c(2, 2, 4)
rss <- sum((preds - actual) ^ 2)
tss <- sum((actual - mean(actual)) ^ 2)
rsq <- 1 - rss/tss

wordsforthewise
- 13,746
- 5
- 87
- 117