Getting formula for regression line in r

Question

There are a couple of answers on stackoverflow that show how to return a formula for a regression line, but I cannot figure out how to get them to work and give me a formula.

Here is one example that I am trying to use that takes a dataframe.

In my code, I am reading in data from a table:

vafs <- read.table("outputFile", header = TRUE)
sample1 <- vafs$Sample1
sample2 <- vafs$Sample2

And then plotting generally like this:

lm_eqn <- function(df,y,x){
    m <- lm(y ~ x, df);
    eq <- substitute(italic(y) == a + b %.% italic(x)*","~~italic(r)^2~"="~r2,
         list(a = format(coef(m)[1], digits = 2),
              b = format(coef(m)[2], digits = 2),
             r2 = format(summary(m)$r.squared, digits = 3)))
    as.character(as.expression(eq));
}

p <- ggplot(vafs, aes(x=sample1, y=sample2, alpha=0.5, label=identity, size=15)) +
  geom_text(aes(x = 10, y = 300, label = lm_eqn(vafs, sample1, sample2)), parse = TRUE)+
  geom_point() +
  xlim(0,0.003) +
  ylim(0,0.003) +
  geom_abline(intercept = 0, slope = 1, size=3)+ # y=x line
  xlab('\nVAF Individual 1') + ylab('VAF Individual 2\n') +
  labs(title = 'Muliplier = 1x\n')+
  geom_smooth(method=lm, se=TRUE, size=3, colour='red')+ # regression line
  theme_bw()+ # no gray background
  theme(panel.border = element_blank())+ # no border
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())+ # no gridlines
  theme(axis.title = element_text(size = 50))+ # change label size
  theme(plot.title = element_text(size = 50))+ # change title size
  theme(plot.title = element_text(hjust = 0.5))+ # center title
  theme(axis.text.x = element_text(size = 50, colour="black", angle=90))+ # change tick size
  theme(axis.text.y = element_text(size = 50, colour="black"))+ # change tick size
  theme(legend.position="none")+ # no legend
  theme(axis.ticks = element_line(colour = "black", size = 2))+ # hide ticks
  theme(axis.line = element_line(colour = "black", size=3)) # add axis
jpeg("output2.jpg", units="in", width=17, height=17, res=500)
print(p)
dev.off()

Then if I try to use the method answered in the link above (also shown below) I can't figure out how to properly pass it my data; just simply passing vafs does not work.

lm_eqn <- function(df){
    m <- lm(y ~ x, df);
    eq <- substitute(italic(y) == a + b %.% italic(x)*","~~italic(r)^2~"="~r2, 
         list(a = format(coef(m)[1], digits = 2), 
              b = format(coef(m)[2], digits = 2), 
             r2 = format(summary(m)$r.squared, digits = 3)))
    as.character(as.expression(eq));                 
}

p1 <- p + geom_text(x = 25, y = 300, label = lm_eqn(df), parse = TRUE)

Here is some sample input data to understand what i'm taking as input:

Sample1 Sample2 Identity
0.000100576639399   7.01336045166e-05   label
0.000201153278798   0.000263732817381   label
6.70510929328e-05   0.000109685906595   label

missuse · Accepted Answer · 2017-09-26T22:43:23.080

2

I found it working if I add y, and x arguments to the lm_eqn function:

library(ggplot2)

ggplot(cars)+
  geom_point(aes(x=speed, y =dist ))+
  geom_text(aes(x = 10, y = 300, label = lm_eqn(cars, dist, speed)), parse = TRUE)

lm_eqn <- function(df, y, x){
  m <- lm(y ~ x, df);
  eq <- substitute(italic(y) == a + b %.% italic(x)*","~~italic(r)^2~"="~r2, 
                   list(a = format(coef(m)[1], digits = 2), 
                        b = format(coef(m)[2], digits = 2), 
                        r2 = format(summary(m)$r.squared, digits = 3)))
  as.character(as.expression(eq));                 
}

EDIT: with annotate I had to define x and y in lm_eqn explicitly since they were not defined in the aes call. However the improvement in the look is worth it:

ggplot(cars)+
  geom_point(aes(x=speed, y =dist ))+
  annotate(geom = "text",x = 10, y = 300, label = lm_eqn(cars, cars$dist, cars$speed), parse = TRUE)

edited Sep 26 '17 at 22:43

answered Sep 26 '17 at 21:03

missuse

19,056
3
25
47

1

It's good practice to replace the `geom_text(...)` with `annotate(geom = "text", ...)`. The former plots one instance per row of of the data, resulting in the bold/blurry text, while the latter only plots one label. – Brian Sep 26 '17 at 21:25
@missuse This actually seems to almost be working, as everything runs and I get a plot out, but there is no formula displayed on the plot. Am i calling the method improperly? – The Nightman Sep 26 '17 at 21:28
1

@Brian thanks, added edit with your suggestion. It does look much better. The Nightman (philly rocks!) does it work with data(cars) as in my example or not? I am using ggplot2 2.2.1. In geom_text I could make it work only if `x`, `y` and `label` are in `aes`. Can you provide a sample of data with dput. – missuse Sep 26 '17 at 22:07
@missuse (Sunny is the best haha) and yeah I can replicate exactly what you have done, so I'm not sure what the difference is. I have added the first few rows of data so you can get the idea of what it looks like. – The Nightman Sep 26 '17 at 22:14
@The Nightman try `label = lm_eqn(vafs, Sample1, Sample2)` since `sample1` and `sample2` are not in `vafs`, with `annotate` it should not be a problem if you make global variables `sample1` and `sample2` as you have shown. – missuse Sep 26 '17 at 22:19
@missuse it still runs and plots like this, but still does not show the formula on the plot. – The Nightman Sep 26 '17 at 22:33
1

@The Nightman `x = 10, y = 300` in `geom_text` while `xlim(0,0.003)` and `ylim(0,0.003)`, fix the `geom_text` coordinates. Improvement? – missuse Sep 26 '17 at 22:40
@missuse Perfect! Yeah the coordinates were off. Thanks for the help. – The Nightman Sep 27 '17 at 15:54

Getting formula for regression line in r

1 Answers1