Labelling R2 and p value in ggplot?

Question

I am trying to add lm model coefs of two parallel modelling results onto the same ggplot plot. Here is my working example:

library(ggplot2)

set.seed(100)
dat <- data.frame(
        x <- rnorm(100, 1),
        y <- rnorm(100, 10),
        lev <- gl(n = 2, k = 50, labels = letters[1:2])
        )

mod1 <- lm(y~x, dat = dat[lev %in% "a", ])
r1 <- paste("R^2==", round(summary(mod1)[[9]], 3))
p1<- paste("p==", round(summary(mod1)[[4]][2, 4], 3), sep= "")
lab1 <- paste(r1, p1, sep =",")

mod2 <- lm(y~x, dat = dat[lev %in% "b", ])
r2 <- paste("R^2==", round(summary(mod2)[[9]], 3))
p2 <- paste("p==", round(summary(mod2)[[4]][2, 4], 3), sep= "")
lab2 <- paste(r2, p2, sep =",")

ggplot(dat, aes(x = x, y = y, col = lev)) + geom_jitter() + geom_smooth(method = "lm") + annotate("text", x = 2, y = 12, label = lab1, parse = T) + annotate("text", x = 10, y = 8, label = lab2, parse = T)

Here is the promot shows:

Error in parse(text = text[[i]]) : <text>:1:12: unexpected ','
1: R^2== 0.008,

Now the problem is that I could label either R2 or p value seperately, but not both of them together. How could I do to put the two results into one single line on the figure? BTW, any other efficienty way of doing the same thing as my code? I have nine subplots that I want to put into one full plot, and I don't want to add them one by one.

++++++++++++++++++++++++++ Some update ++++++++++++++++++++++++++++++++++ Following @G. Grothendieck 's kind suggestion and idea, I tried to wrap the most repeatative part of the codes into a function, so I could finish all the plot with a few lines. Now the problem is that, whatever I changed the input variables, the output plot are basically the same, except the axis labels. Can anyone explain why? The following is the working code I used:

library(ggplot2)
library(ggpubr)

set.seed(100)
dat <- data.frame(
        x = rnorm(100, 1),
        y = rnorm(100, 10),
        z = rnorm(100, 25),
        lev = gl(n = 2, k = 50, labels = letters[1:2])
        )
test <- function(dat, x, y){
fmt <- "%s: Adj ~ R^2 == %.3f * ',' ~ {p == %.3f}"

mod1 <- lm(y ~ x, dat, subset = lev == "a")
sum1 <- summary(mod1)
lab1 <- sprintf(fmt, "a", sum1$adj.r.squared, coef(sum1)[2, 4])

mod2 <- lm(y ~ x, dat, subset = lev == "b")
sum2 <- summary(mod2)
lab2 <- sprintf(fmt, "b", sum2$adj.r.squared, coef(sum2)[2, 4])

colors <- 1:2

p <- ggplot(dat, aes(x = x, y = y, col = lev)) + 
  geom_jitter() +
  geom_smooth(method = "lm") + 
  annotate("text", x = 2, y = c(12, 8), label = c(lab1, lab2), 
    parse = TRUE, hjust = 0, color = colors) +
  scale_color_manual(values = colors)
return(p)
} 

ggarrange(test(dat, x, z), test(dat, y, z))

These might help https://stackoverflow.com/questions/48912224/how-to-add-linear-lines-to-a-plot-with-multiple-data-sets-of-a-data-frame & https://stackoverflow.com/questions/52681895/ggplot2-issues-with-dual-y-axes-and-loess-smoothing — Tung, Mar 01 '20 at 06:08
Thanks Tung, I also found stat_poly_eq to be very helpful. Unfortunately, it seems that stat_poly_eq can't output p.value, so some other people used stat_fit_glance also. — Marco, Mar 02 '20 at 07:15

G. Grothendieck · Answer 1 · 2020-03-01T13:13:26.190

There are several problems here:

x, y and lev are arguments to data.frame so they must be specified using = rather than <-
make use of the subset= argument in lm
use sprintf instead of paste to simplify the specification of labels
label the text strings a and b and make them the same color as the corresponding lines to identify which is which
the formula syntax needs to be corrected. See fmt below.
it would be clearer to use component names and accessor functions of the summary objects where available
use TRUE rather than T because the latter can be overridden if there is a variable called T but TRUE can never be overridden.
use hjust=0 and adjust the x= and y= in annotate to align the two text strings
combine the annotate statements
place the individual terms of the ggplot statement on separate lines for improved readability

This gives:

library(ggplot2)

set.seed(100)
dat <- data.frame(
        x = rnorm(100, 1),
        y = rnorm(100, 10),
        lev = gl(n = 2, k = 50, labels = letters[1:2])
        )

fmt <- "%s: Adj ~ R^2 == %.3f * ',' ~ {p == %.3f}"

mod1 <- lm(y ~ x, dat, subset = lev == "a")
sum1 <- summary(mod1)
lab1 <- sprintf(fmt, "a", sum1$adj.r.squared, coef(sum1)[2, 4])

mod2 <- lm(y ~ x, dat, subset = lev == "b")
sum2 <- summary(mod2)
lab2 <- sprintf(fmt, "b", sum2$adj.r.squared, coef(sum2)[2, 4])

colors <- 1:2

ggplot(dat, aes(x = x, y = y, col = lev)) + 
  geom_jitter() +
  geom_smooth(method = "lm") + 
  annotate("text", x = 2, y = c(12, 8), label = c(lab1, lab2), 
    parse = TRUE, hjust = 0, color = colors) +
  scale_color_manual(values = colors)

Thanks @G. Grothendieck for your kind help. Now a new problem emerge, and can you also check why? Thanks! — Marco, Mar 04 '20 at 05:36

score 1 · Answer 2 · answered Mar 01 '20 at 06:13

1

Unless I'm misunderstanding your question, the problem's with the parse = T arguments to your annotate calls. I don't think your strings need to be parsed. Try parse = F instead, or just drop the parameter, as the default value seems to be FALSE anyway

answered Mar 01 '20 at 06:13

Hobo

7,536
5
40
50

If you don't use parse=TRUE then the strings won't be formatted. For example the 2 in R squared won't appear as a superscript but will appear as a circumflex followed by 2. – G. Grothendieck Mar 01 '20 at 15:32

Labelling R2 and p value in ggplot?

2 Answers2