0

I like the neatness of using facet_wrap() or facet_grid() with ggplot since the plots are all made to be the same size and are fitted row and column wise automatically.

I have a data frame and I am experimenting with various transformations and their impact on fit as measured by R2

dm1 <- lm(price ~ x, data = diamonds)
dm1R2 <- summary(dm1)$r.squared #0.78

dm2 <- lm(log(price) ~ x, data = diamonds)
dm2R2 <- summary(dm2)$r.squared # 0.9177831

dm3 <- lm(log(price) ~ x^2, data = diamonds)
dm3R2 <- summary(dm3)$r.squared # also 0.9177831. Aside, why?

ggplot(diamonds, aes(x = x, y = price)) +
  geom_point() +
  geom_smooth(method = "lm", se = F) +
  geom_text(x = 3.5, y = 10000, label = paste0('R-Squared: ', round(dm1R2, 3)))

ggplot(diamonds, aes(x = x, y = log(price))) +
  geom_point() +
  geom_smooth(method = "lm", se = F) +
  geom_text(x = 3, y = 9, label = paste0('R-Squared: ', round(dm2R2, 3)))

ggplot(diamonds, aes(x = x^2, y = log(price))) +
  geom_point() +
  geom_smooth(method = "lm", se = F) +
  geom_text(x = 3, y = 20, label = paste0('R-Squared: ', round(dm3R2, 3)))

This produces 3 completely separate plots. Within Rmd file they will appear one after the other.

Is there a way to add them to a grid like when using facet_wrap?

Doug Fir
  • 19,971
  • 47
  • 169
  • 299
  • 2
    Try the cowplot package – MDEWITT Sep 13 '19 at 00:31
  • 2
    Take a look at [egg](https://cran.r-project.org/web/packages/egg/vignettes/Ecosystem.html) or [cowplot](https://cran.r-project.org/web/packages/cowplot/vignettes/introduction.html) or [patchwork](https://github.com/thomasp85/patchwork). Note that facets have a specific use case, different from just a simple grid. Use them when you want to split by a variable and have comparable axes. – neilfws Sep 13 '19 at 00:34
  • Possible duplicate of [Side-by-side plots with ggplot2](https://stackoverflow.com/questions/1249548/side-by-side-plots-with-ggplot2) – camille Sep 13 '19 at 01:58

2 Answers2

2

You can use ggplot2's built-in faceting if you generate a "long" data frame from the regression model objects. The model object returned by lm includes the data used to fit the model, so we can extract the data and the r-squared for each model, stack them into a single data frame, and generate a faceted plot.

The disadvantage of this approach is that you lose the ability to easily set separate x-axis and y-axis titles for each panel, which is important, because the x and y values have different transformations in different panels. In an effort to mitigate that problem, I've used the model formulas as the facet labels.

Also, the reason you got the same r-squared for the models specified by log(price) ~ x and log(price) ~ x^2 is that R treats them as the same model. To tell R that you literally mean x^2 in a model formula, you need to wrap it in the I() function, making the formula log(price) ~ I(x^2). You could also do log(price) ~ poly(x, 2, raw=TRUE).

library(tidyverse)
theme_set(theme_bw(base_size=14))

# Generate a small subset of the diamonds data frame 
set.seed(2)
dsub = diamonds[sample(1:nrow(diamonds), 2000), ]

dm1 <- lm(price ~ x, data = dsub)
dm2 <- lm(log(price) ~ x, data = dsub)
dm3 <- lm(log(price) ~ I(x^2), data = dsub)

# Create long data frame from the three model objects
dat = list(dm1, dm2, dm3) %>% 
  map_df(function(m) {    
    tibble(r2=summary(m)$r.squared,
           form=as_label(formula(m))) %>% 
      cbind(m[["model"]] %>% set_names(c("price","x")))
    }, .id="Model") %>% 
  mutate(form=factor(form, levels=unique(form)))

# Create data subset for geom_text
text.dat = dat %>% group_by(form) %>% 
  summarise(x = quantile(x, 1), 
            price = quantile(price, 0.05), 
            r2=r2[1])

dat %>% 
  ggplot(aes(x, price)) +
  geom_point(alpha=0.3, colour="red") +
  geom_smooth(method="lm") +
  geom_text(data=text.dat, parse=TRUE,
            aes(label=paste0("r^2 ==", round(r2, 2))), 
            hjust=1, size=3.5, colour="grey30") +
  facet_wrap(~ form, scales="free") 

enter image description here

eipi10
  • 91,525
  • 24
  • 209
  • 285
  • This was really, really great. Thanks a lot! – Doug Fir Sep 13 '19 at 17:33
  • May I ask a follow up. From what I can tell if I use a quadratic term I should include the original term in my model. I tried adding both `x` and `I(x^2)` in model 3 but then I get an error ' Error: `nm` must be `NULL` or a character vector the same length as `x`' I think it might be to do with the `set_names()` part and I tried adding with `cbind(m[["model"]] %>% set_names(c("price","x", "I(x^2)")))` which gave the same error. Any ideas? – Doug Fir Sep 13 '19 at 18:12
  • It's a bit more complicated, because now you effectively have independent variables. I'll think about this and update my answer. – eipi10 Sep 13 '19 at 19:14
  • That should have been "you effectively have two independent variables" – eipi10 Sep 13 '19 at 21:16
1

ggarrange from the ggpubr package can do this:

p1 = ggplot(diamonds, aes(x = x, y = price)) +
    geom_point() +
    geom_smooth(method = "lm", se = F) +
    geom_text(x = 3.5, y = 10000, label = paste0('R-Squared: ', round(dm1R2, 3)))

p2 = ggplot(diamonds, aes(x = x, y = log(price))) +
    geom_point() +
    geom_smooth(method = "lm", se = F) +
    geom_text(x = 3, y = 9, label = paste0('R-Squared: ', round(dm2R2, 3)))

p3 = ggplot(diamonds, aes(x = x^2, y = log(price))) +
    geom_point() +
    geom_smooth(method = "lm", se = F) +
    geom_text(x = 3, y = 20, label = paste0('R-Squared: ', round(dm3R2, 3)))

ggpubr::ggarrange(p1, p2, p3, ncol = 2, nrow = 2, align = "hv")

Other packages that have been suggested in the comments like cowplot and patchwork also offer good options for this.

Marius
  • 58,213
  • 16
  • 107
  • 105