6

What is the best (easiest) approach to add neatly to a ggplot plot the regression equation, the R2, and the p-value (for the equation)? Ideally it should be compatible with groups and faceting.

This first plot with has the regression equation plus the r2 and p-value by group using ggpubr, but they are not aligned? Am I missing something? Could they be included as one string?

library(ggplot)
library(ggpubr)

ggplot(mtcars, aes(x = wt, y = mpg, group = cyl))+
  geom_smooth(method="lm")+
  geom_point()+
  stat_regline_equation()+
  stat_cor(aes(label = paste(..rr.label.., ..p.label.., sep = "*`,`~")),
           label.x.npc = "centre")

plot1

Here is an option with ggpmisc, that does some odd placement.
EDIT Odd placement was caused by geom=text, which I've commented out to provide better placement, and added `label.x = "right" to stop overplotting. We still have misalignemnt as per ggpubr, due to the superscript issue flagged by @dc37

#https://stackoverflow.com/a/37708832/4927395
library(ggpmisc)

ggplot(mtcars, aes(x = wt, y = mpg, group = cyl))+
  geom_smooth(method="lm")+
  geom_point()+
  stat_poly_eq(formula = "y~x", 
             aes(label = paste(..eq.label.., ..rr.label.., sep = "*`,`~")), 
             parse = TRUE)+
  stat_fit_glance(method = 'lm',
                  method.args = list(formula = "y~x"),
                  #geom = 'text',

                  aes(label = paste("P-value = ", signif(..p.value.., digits = 4), sep = "")))

plot2_edited

I did find a good solution for bringing the relevant stats together, but that requires creating the regression outside ggplot, and a pile of string manipulation fluff - is this as easy as it gets? Also, it doesn't (as currently coded) deal to the grouping, and wouldn't deal with facetting.

#https://stackoverflow.com/a/51974753/4927395
#Solution as one string, equation, R2 and p-value
lm_eqn <- function(df, y, x){
  formula = as.formula(sprintf('%s ~ %s', y, x))
  m <- lm(formula, data=df);
  # formating the values into a summary string to print out
  # ~ give some space, but equal size and comma need to be quoted
  eq <- substitute(italic(target) == a + b %.% italic(input)*","~~italic(r)^2~"="~r2*","~~p~"="~italic(pvalue), 
                   list(target = y,
                        input = x,
                        a = format(as.vector(coef(m)[1]), digits = 2), 
                        b = format(as.vector(coef(m)[2]), digits = 2), 
                        r2 = format(summary(m)$r.squared, digits = 3),
                        # getting the pvalue is painful
                        pvalue = format(summary(m)$coefficients[2,'Pr(>|t|)'], digits=1)
                   )
  )
  as.character(as.expression(eq));                 
}

ggplot(mtcars, aes(x = wt, y = mpg, group=cyl))+
  geom_point() +
  geom_text(x=3,y=30,label=lm_eqn(mtcars, 'wt','mpg'),color='red',parse=T) +
  geom_smooth(method='lm')

enter image description here

Mark Neal
  • 996
  • 16
  • 52

3 Answers3

7

I have updated 'ggpmisc' to make this easy. Version 0.3.4 is now on its way to CRAN, source package is on-line, binaries should be built in a few days' time.

library(ggpmisc) # version >= 0.3.4 !!

ggplot(mtcars, aes(x = wt, y = mpg, group = cyl)) +
  geom_smooth(method="lm")+
  geom_point()+
  stat_poly_eq(formula = y ~ x, 
               aes(label = paste(..eq.label.., ..rr.label.., ..p.value.label.., sep = "*`,`~")), 
               parse = TRUE,
               label.x.npc = "right",
               vstep = 0.05) # sets vertical spacing

enter image description here

Pedro J. Aphalo
  • 5,796
  • 1
  • 22
  • 23
5

A possible solution with ggpubr is to place your equation formula and R2 values on top of the graph by passing Inf to label.y and Inf or -Inf to label.x (depending if you want it on the right or left side of the plot)

Both text won't aligned because of the superscript 2 on R. So, you will have to tweak it a little bit by using vjust and hjust in order to align both texts.

Then, it will work even with facetted graphs with different scales.

library(ggplot)
library(ggpubr)

ggplot(mtcars, aes(x = wt, y = mpg, group = cyl))+
  geom_smooth(method="lm")+
  geom_point()+
  stat_regline_equation(label.x = -Inf, label.y = Inf, vjust = 1.5, hjust = -0.1, size = 3)+
  stat_cor(aes(label = paste(..rr.label.., ..p.label.., sep = "*`,`~")),
           label.y= Inf, label.x = Inf, vjust = 1, hjust = 1.1, size = 3)+
  facet_wrap(~cyl, scales = "free")

enter image description here

Does it answer your question ?


EDIT: Alternative by manually adding the equation

As described in your similar question (Label ggplot groups using equation with ggpmisc), you can add your equation by passing the text as geom_text:

df_mtcars <- mtcars %>% mutate(factor_cyl = as.factor(cyl))

df_label <- df_mtcars %>% group_by(factor_cyl) %>%
  summarise(Inter = lm(mpg~wt)$coefficients[1],
            Coeff = lm(mpg~wt)$coefficients[2],
            pval = summary(lm(mpg~wt))$coefficients[2,4],
            r2 = summary(lm(mpg~wt))$r.squared) %>% ungroup() %>%
  #mutate(ypos = max(df_mtcars$mpg)*(1-0.05*row_number())) %>%
  #mutate(Label2 = paste(factor_cyl,"~Cylinders:~", "italic(y)==",round(Inter,3),ifelse(Coeff <0,"-","+"),round(abs(Coeff),3),"~italic(x)",sep ="")) %>%
  mutate(Label = paste("italic(y)==",round(Inter,3),ifelse(Coeff <0,"-","+"),round(abs(Coeff),3),"~italic(x)",
                       "~~~~italic(R^2)==",round(r2,3),"~~italic(p)==",round(pval,3),sep =""))

# A tibble: 3 x 6
  factor_cyl Inter Coeff   pval    r2 Label                                                                    
  <fct>      <dbl> <dbl>  <dbl> <dbl> <chr>                                                                    
1 4           39.6 -5.65 0.0137 0.509 italic(y)==39.571-5.647~italic(x)~~~~italic(R^2)==0.509~~italic(p)==0.014
2 6           28.4 -2.78 0.0918 0.465 italic(y)==28.409-2.78~italic(x)~~~~italic(R^2)==0.465~~italic(p)==0.092 
3 8           23.9 -2.19 0.0118 0.423 italic(y)==23.868-2.192~italic(x)~~~~italic(R^2)==0.423~~italic(p)==0.012

And you can use it for geom_text as follow:

ggplot(df_mtcars,aes(x = wt, y = mpg, group = factor_cyl, colour= factor_cyl))+
  geom_smooth(method="lm")+
  geom_point()+
  geom_text(data = df_label,
            aes(x = -Inf, y = Inf, 
                label = Label, color = factor_cyl), 
          show.legend = FALSE, parse = TRUE, size = 3,vjust = 1, hjust = 0)+
  facet_wrap(~factor_cyl)

enter image description here

At least, it solves the issue of the mis-alignement due to the superscript 2 on R.

dc37
  • 15,840
  • 4
  • 15
  • 32
  • 1
    That was a revelation about the superscript causing the misalignment. I wonder if it would be possible to add a blank superscript to the equation to sort the alignment without a hjust? – Mark Neal Apr 17 '20 at 08:28
  • 1
    Interesting idea ;) However, right now, I do not see how to do it easily – dc37 Apr 17 '20 at 16:37
  • This answer is I think good enough, I’ll give it a tick unless something else comes along. The next step forward would probably be creating a single string with ggpmisc that has all the relevant stats. Alternatively, maybe I should raise an issue at github for ggpubr to see if it could do something similar with low mental overhead. – Mark Neal Apr 17 '20 at 18:55
  • 1
    As you wish ;) I think raising this "issue" to developpers of `ggpubr` could be interesting. Maybe they know a trick to get it done more easily. – dc37 Apr 17 '20 at 18:57
  • This solution does work well with the groups as facet plots, but overplots the strings when you have groups within the same plot. For groups within a plot, as soon as you include vjust arguments, or label.x arguments, you appear to lose the position increments that stops overplotting of each group string. – Mark Neal Apr 17 '20 at 21:15
  • Discussed on related issue for ggpubr github [here](https://github.com/kassambara/ggpubr/issues/247#issuecomment-615469276) – Mark Neal Apr 17 '20 at 21:22
  • 1
    I see ;) Thanks for the link. I will keep in mind and think about it when I will have more time ;) – dc37 Apr 17 '20 at 21:24
  • If you change `stat_regline...` to this, `stat_regline_equation(aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~~")), formula = "y~x")` it puts the R2 in both, which fixes alignment. Now I'm tryng to work out how to make the r2 part in regline transparent ( with `color = "00000000"`) by the method described [here](https://www.infoworld.com/article/3527449/add-color-to-your-ggplot2-text-in-r.html) using css with **ggtext**. – Mark Neal Apr 18 '20 at 07:56
  • Interesting ! but less and less user friendly :D Good luck on that ;) – dc37 Apr 19 '20 at 00:36
  • Getting the object/component/element name to then convert to ‘element_markdown()’ is the sticking point for that solution. I also looked at ggpmisc options to return a single string, which I’ll revisit if this comes to a dead end. Basically ‘stat_fit_glance()’ appears to get a model summary that doesn’t include the p value. Documentation for ‘stat_poly_eq()’ suggests in the heading that a p value could be returned, but doesn’t say how and I haven’t found an example of it working - see https://www.rdocumentation.org/packages/ggpmisc/versions/0.3.3/topics/stat_poly_eq – Mark Neal Apr 19 '20 at 01:08
  • ggpmisc doesn't look like it can return a single string, see issue raised [here](https://bitbucket.org/aphalo/ggpmisc/issues/34/stat_poly_eq-doesnt-make-p-value-available) – Mark Neal Apr 20 '20 at 03:11
  • 1
    I edited my answer using a manual edition of the equation to plot on your graph using the similar workflow described in https://stackoverflow.com/questions/61357383/label-ggplot-groups-using-equation-with-ggpmisc/61358526#61358526 – dc37 Apr 22 '20 at 07:36
3

Here I use ggpmisc, with one call to stat_poly_eq() for the equation (centre top), and one call to stat_fit_glance() for the stats (pvalue and r2). The secret sauce for the alignment is using yhat as the left hand side for the equation, as the hat approximates the text height that then matches the superscript for the r2 - hat tip to Pedro Aphalo for the yhat, shown here.

Would be great to have them as one string, which means horizontal alignment would not be a problem, and then locating it conveniently in the plot space would be easier. I've raised as issues at ggpubr and ggpmisc.

I'll happily accept another better answer!

library(ggpmisc)

df_mtcars <- mtcars %>% mutate(factor_cyl = as.factor(cyl))

my_formula <- "y~x"

ggplot(df_mtcars, aes(x = wt, y = mpg, group = factor_cyl, colour= factor_cyl))+
  geom_smooth(method="lm")+
  geom_point()+
  stat_poly_eq(formula = my_formula,
               label.x = "centre",
               eq.with.lhs = "italic(hat(y))~`=`~",
               aes(label = paste(..eq.label.., sep = "~~~")), 
               parse = TRUE)+
  stat_fit_glance(method = 'lm',
                  method.args = list(formula = my_formula),
                  #geom = 'text',
                  label.x = "right", #added to prevent overplotting
                  aes(label = paste("~italic(p) ==", round(..p.value.., digits = 3),
                                    "~italic(R)^2 ==", round(..r.squared.., digits = 2),
                                    sep = "~")),
                  parse=TRUE)+
  theme_minimal()

plot result

Note facet also works neatly, and you could have different variables for the facet and grouping and everything still works.

plot facet result

Note: If you do use the same variable for group and for facet, adding label.y= Inf, to each call will force the label to the top of each facet (hat tip @dc37, in another answer to this question).

Mark Neal
  • 996
  • 16
  • 52
  • 1
    Also, if you (like me), hate pointlessly small p values being displayed, use `aes(label = paste("~italic(p) ==", ifelse(..p.value.. <0.001, " '<0.001' ", round(..p.value.., digits = 3)), "~italic(R)^2 ==", round(..r.squared.., digits = 2), sep = "~")),` in the call to `stat_fit_glance()` – Mark Neal Apr 20 '20 at 05:04
  • 1
    Really nice answer ;) I will look at that carefully ;) – dc37 Apr 20 '20 at 05:15
  • If you want to label more directly, and remove the legend, the approach [here](https://stackoverflow.com/questions/61357383/label-ggplot-groups-using-equation-with-ggpmisc) will be of interest. The use of `ggplot_build()` also suggests a possible option for creating a single string by joining the two strings and deleting the duplicate. – Mark Neal Apr 22 '20 at 04:39