14

I got this plot

enter image description here

Using the code below

library(dplyr) 
library(ggplot2)
library(ggpmisc)

df <- diamonds %>%
  dplyr::filter(cut%in%c("Fair","Ideal")) %>%
  dplyr::filter(clarity%in%c("I1" ,  "SI2" , "SI1" , "VS2" , "VS1",  "VVS2")) %>%
  dplyr::mutate(new_price = ifelse(cut == "Fair", 
                                   price* 0.5, 
                                   price * 1.1))

formula <- y ~ x    
ggplot(df, aes(x= new_price, y= carat, color = cut)) +
  geom_point(alpha = 0.3) +
  facet_wrap(~clarity, scales = "free_y") +
  geom_smooth(method = "lm", formula = formula, se = F) +
  stat_poly_eq(aes(label = paste(..rr.label..)), 
               label.x.npc = "right", label.y.npc = 0.15,
               formula = formula, parse = TRUE, size = 3)

In addition to R2, I want to add p-values to the facets as well. I can do this manually through running the regression first then getting p-values and using geom_text() to add these p-values similar to the answer of this question.

Is there any faster or automated way to do that? e.g. similar to the way R2 values have been added.

Update

The p-value I'm talking about is the slope p-value. The trends are considered highly statistically significant when p < 0.005.

Community
  • 1
  • 1
shiny
  • 3,380
  • 9
  • 42
  • 79
  • Isnt that a duplicate of [this question](http://stackoverflow.com/questions/26564434/automaticly-add-p-values-to-facet-plot?lq=1). It basically tells you to use `summarize()` – Manuel R May 31 '16 at 07:16
  • Please, see the [answer](http://stackoverflow.com/a/35140066/3817004) to [ggplot2: Adding Regression Line Equation and R2 on graph](http://stackoverflow.com/q/7549694/3817004) by the author of the `ggpmisc` package for more details or contact the author. – Uwe May 31 '16 at 07:37
  • Did you take a look at `stat_fit_glance` ? source : https://cran.r-project.org/web/packages/ggpmisc/vignettes/examples.html – bVa Jun 08 '16 at 13:53
  • 1
    p-value does not mean "the probability that each trend is significantly different from zero" – C8H10N4O2 Jun 08 '16 at 17:15

1 Answers1

24

Use stat_fit_glance which is part of the ggpmisc package in R. This package is an extension of ggplot2 so it works well with it.

ggplot(df, aes(x= new_price, y= carat, color = cut)) +
       geom_point(alpha = 0.3) +
       facet_wrap(~clarity, scales = "free_y") +
       geom_smooth(method = "lm", formula = formula, se = F) +
       stat_poly_eq(aes(label = paste(..rr.label..)), 
       label.x.npc = "right", label.y.npc = 0.15,
       formula = formula, parse = TRUE, size = 3)+
       stat_fit_glance(method = 'lm',
                       method.args = list(formula = formula),
                       geom = 'text',
                       aes(label = paste("P-value = ", signif(..p.value.., digits = 4), sep = "")),
       label.x.npc = 'right', label.y.npc = 0.35, size = 3)

stat_fit_glance basically takes anything passed through lm() in R and allows it to processed and printed using ggplot2. The user-guide has the rundown of some of the functions like stat_fit_glance: https://cran.r-project.org/web/packages/ggpmisc/vignettes/user-guide.html. Also I believe this gives model p-value, not slope p-value (in general), which would be different for multiple linear regression. For simple linear regression they should be the same though.

Here is the plot:

enter image description here

Mikko
  • 7,530
  • 8
  • 55
  • 92
akash87
  • 3,876
  • 3
  • 14
  • 30
  • Many thanks for your time and help. In my analysis, slope p-value is different from model p-value. – shiny Jun 08 '16 at 20:51
  • 1
    FYI there is a typo in your package name. It should be `ggpmisc`, not `ggmisc`. Cheers :) – J.Con Aug 03 '17 at 05:49
  • 1
    I get a `Warning: Ignoring unknown parameters: label.x.npc, label.y.npc` and `Error: Discrete value supplied to continuous scale` if I copy and paste the data and formula from the question and the `ggplot` from the answer. – CrunchyTopping Aug 15 '19 at 19:34
  • 1
    @CrunchyTopping `label.x.npc` and `label.y.npc` have both been deprecated. I suggest using `label.x` and `label.y`, also you may want to create a `my.formula <- y~x` and replace that with `formula = formula` into `formula = my.formula` and it should work. – Lime Dec 06 '19 at 15:33