0

I am trying to add lm model coefs of two parallel modelling results onto the same ggplot plot. Here is my working example:

library(ggplot2)

set.seed(100)
dat <- data.frame(
        x <- rnorm(100, 1),
        y <- rnorm(100, 10),
        lev <- gl(n = 2, k = 50, labels = letters[1:2])
        )

mod1 <- lm(y~x, dat = dat[lev %in% "a", ])
r1 <- paste("R^2==", round(summary(mod1)[[9]], 3))
p1<- paste("p==", round(summary(mod1)[[4]][2, 4], 3), sep= "")
lab1 <- paste(r1, p1, sep =",")

mod2 <- lm(y~x, dat = dat[lev %in% "b", ])
r2 <- paste("R^2==", round(summary(mod2)[[9]], 3))
p2 <- paste("p==", round(summary(mod2)[[4]][2, 4], 3), sep= "")
lab2 <- paste(r2, p2, sep =",")

ggplot(dat, aes(x = x, y = y, col = lev)) + geom_jitter() + geom_smooth(method = "lm") + annotate("text", x = 2, y = 12, label = lab1, parse = T) + annotate("text", x = 10, y = 8, label = lab2, parse = T)

Here is the promot shows:

Error in parse(text = text[[i]]) : <text>:1:12: unexpected ','
1: R^2== 0.008,

Now the problem is that I could label either R2 or p value seperately, but not both of them together. How could I do to put the two results into one single line on the figure? BTW, any other efficienty way of doing the same thing as my code? I have nine subplots that I want to put into one full plot, and I don't want to add them one by one.

++++++++++++++++++++++++++ Some update ++++++++++++++++++++++++++++++++++ Following @G. Grothendieck 's kind suggestion and idea, I tried to wrap the most repeatative part of the codes into a function, so I could finish all the plot with a few lines. Now the problem is that, whatever I changed the input variables, the output plot are basically the same, except the axis labels. Can anyone explain why? The following is the working code I used:

library(ggplot2)
library(ggpubr)

set.seed(100)
dat <- data.frame(
        x = rnorm(100, 1),
        y = rnorm(100, 10),
        z = rnorm(100, 25),
        lev = gl(n = 2, k = 50, labels = letters[1:2])
        )
test <- function(dat, x, y){
fmt <- "%s: Adj ~ R^2 == %.3f * ',' ~ {p == %.3f}"

mod1 <- lm(y ~ x, dat, subset = lev == "a")
sum1 <- summary(mod1)
lab1 <- sprintf(fmt, "a", sum1$adj.r.squared, coef(sum1)[2, 4])

mod2 <- lm(y ~ x, dat, subset = lev == "b")
sum2 <- summary(mod2)
lab2 <- sprintf(fmt, "b", sum2$adj.r.squared, coef(sum2)[2, 4])

colors <- 1:2

p <- ggplot(dat, aes(x = x, y = y, col = lev)) + 
  geom_jitter() +
  geom_smooth(method = "lm") + 
  annotate("text", x = 2, y = c(12, 8), label = c(lab1, lab2), 
    parse = TRUE, hjust = 0, color = colors) +
  scale_color_manual(values = colors)
return(p)
} 

ggarrange(test(dat, x, z), test(dat, y, z))
Marco
  • 505
  • 9
  • 18
  • 1
    These might help https://stackoverflow.com/questions/48912224/how-to-add-linear-lines-to-a-plot-with-multiple-data-sets-of-a-data-frame & https://stackoverflow.com/questions/52681895/ggplot2-issues-with-dual-y-axes-and-loess-smoothing – Tung Mar 01 '20 at 06:08
  • Thanks Tung, I also found stat_poly_eq to be very helpful. Unfortunately, it seems that stat_poly_eq can't output p.value, so some other people used stat_fit_glance also. – Marco Mar 02 '20 at 07:15

2 Answers2

2

There are several problems here:

  • x, y and lev are arguments to data.frame so they must be specified using = rather than <-
  • make use of the subset= argument in lm
  • use sprintf instead of paste to simplify the specification of labels
  • label the text strings a and b and make them the same color as the corresponding lines to identify which is which
  • the formula syntax needs to be corrected. See fmt below.
  • it would be clearer to use component names and accessor functions of the summary objects where available
  • use TRUE rather than T because the latter can be overridden if there is a variable called T but TRUE can never be overridden.
  • use hjust=0 and adjust the x= and y= in annotate to align the two text strings
  • combine the annotate statements
  • place the individual terms of the ggplot statement on separate lines for improved readability

This gives:

library(ggplot2)

set.seed(100)
dat <- data.frame(
        x = rnorm(100, 1),
        y = rnorm(100, 10),
        lev = gl(n = 2, k = 50, labels = letters[1:2])
        )

fmt <- "%s: Adj ~ R^2 == %.3f * ',' ~ {p == %.3f}"

mod1 <- lm(y ~ x, dat, subset = lev == "a")
sum1 <- summary(mod1)
lab1 <- sprintf(fmt, "a", sum1$adj.r.squared, coef(sum1)[2, 4])

mod2 <- lm(y ~ x, dat, subset = lev == "b")
sum2 <- summary(mod2)
lab2 <- sprintf(fmt, "b", sum2$adj.r.squared, coef(sum2)[2, 4])

colors <- 1:2

ggplot(dat, aes(x = x, y = y, col = lev)) + 
  geom_jitter() +
  geom_smooth(method = "lm") + 
  annotate("text", x = 2, y = c(12, 8), label = c(lab1, lab2), 
    parse = TRUE, hjust = 0, color = colors) +
  scale_color_manual(values = colors)

screenshot

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Thanks @G. Grothendieck for your kind help. Now a new problem emerge, and can you also check why? Thanks! – Marco Mar 04 '20 at 05:36
1

Unless I'm misunderstanding your question, the problem's with the parse = T arguments to your annotate calls. I don't think your strings need to be parsed. Try parse = F instead, or just drop the parameter, as the default value seems to be FALSE anyway

Hobo
  • 7,536
  • 5
  • 40
  • 50
  • If you don't use parse=TRUE then the strings won't be formatted. For example the 2 in R squared won't appear as a superscript but will appear as a circumflex followed by 2. – G. Grothendieck Mar 01 '20 at 15:32