1

I would like to add the regression line equation and r squared value to my ggplot2 scatter plot.

I have found a similar question, which gives the code below, but it doesn't work when I force the regression through the intercept:

library(devtools)
source_gist("524eade46135f6348140")
df = data.frame(x = c(1:100))
df$y = 2 + 5 * df$x + rnorm(100, sd = 40)
ggplot(data = df, aes(x = x, y = y, label=y)) +
  stat_smooth_func(geom="text",method="lm",hjust=0,parse=TRUE, formula=y~x-1) +
  geom_smooth(method="lm",se=FALSE, formula=y~x-1) +
  geom_point()

By adding formula=y~x-1, the text displayed shows the coefficient as the intercept, with the intercept as NA. Is there a fix for this?

sym246
  • 1,836
  • 3
  • 24
  • 50
  • 2
    I'm not sourcing some unknown gist. If you found the code in a SO question, link that question. Even better, simply provide the source code of `stat_smooth_func` in your question. – Roland Mar 11 '16 at 13:46
  • http://stackoverflow.com/questions/7549694/ggplot2-adding-regression-line-equation-and-r2-on-graph – sym246 Mar 11 '16 at 13:53
  • The above link is where I found the code referenced in the question – sym246 Mar 11 '16 at 13:57

2 Answers2

8

An option is geom_smooth(method="lm",formula=y~0+x).

4

In this simple case (without facetting or grouping), you don't need to create a new stat_*. You can simply do this:

fit <- lm(y ~ x - 1, data = df)
ggplot(data = df, aes(x = x, y = y, label=y)) +
  stat_function(fun = function(x) predict(fit, newdata = data.frame(x = x)),
                color = "blue", size = 1.5) +
  annotate(label = sprintf("y = %.3f x\nR² = %.2f", coef(fit), summary(fit)$r.squared),
           geom = "text", x = 25, y = 400, size = 12) +
  geom_point()

resulting plot

Of course, the stat_* function from the gist would be easy to adjust for regression through the origin.

Off-topic comment: It's very rare that regression without intercept is sensible from the statistics point of view.

Roland
  • 127,288
  • 10
  • 191
  • 288
  • In the context of my data, a value of 0 for one variable, would mean the other has to be 0. In any case, the r squared value is markedly improved when regressing through the origin for my data. Thank you for your answer - it's done the trick! – sym246 Mar 11 '16 at 14:14