3

My data looks like this

data <- structure(list(code = 1:12, outcome1 = c(75L, 76L, 77L, 78L, 
80L, 82L, 85L, 84L, 78L, 84L, 84L, 75L), outcome2 = c(50L, 55L, 
54L, 52L, 56L, 58L, 59L, 54L, 52L, 56L, 56L, 57L), response1 = c(1500L, 
1800L, 1789L, 1200L, 1400L, 1900L, 1800L, 1100L, 1450L, 1750L, 
1770L, 1000L), response2 = c(100L, 111L, 120L, 140L, 144L, 156L, 
147L, 189L, 165L, 154L, 132L, 171L)), class = "data.frame", row.names = c(NA, 
-12L))

I have numerous outcomes = 8 and response variables = 22.

I would like to create a series of regression plots for all outcome * response combination. Is there a fast and easy way to do this?

For example: Outcome1 * Response1, Outcome1 * Response 2, Outcome2 * Response1 and so on.

This is an example code for creating one such outcome1 by response1 graph.

ggplot(data = data, aes(x = outcome1, y = response1)) + 
  geom_point(color='blue') +
  geom_smooth(method = "lm", se = FALSE)

Edit: I have thought about faceting but this may not work here because, for each outcome(x) the various responses(y) are in different units. So y axis scales are not comparable across different (y).

acylam
  • 18,231
  • 5
  • 36
  • 45
DiscoR
  • 247
  • 2
  • 11

1 Answers1

4

Always transform your data to long format before feeding into ggplot. We can then use facet_grid to create the plots:

library(ggplot2)
library(dplyr)
library(tidyr)

data %>%
  gather(var1, value1, outcome1:outcome2) %>%
  gather(var2, value2, response1:response2) %>%
  ggplot(aes(x = value1, y = value2)) + 
  geom_point(color='blue') +
  geom_smooth(method = "lm", se = FALSE) +
  facet_grid(var2 ~ var1, scales = "free", switch = "both",
             labeller = as_labeller(c(response1 = "response1 (mm)",
                          response2 = "response2 (kg)",
                          outcome1 = "outcome1 (index)",
                          outcome2 = "outcome2 (index)"))) +
  labs(title = "Regression Plot Matrix", x = NULL, y = NULL) +
  theme_bw() +
  theme(strip.placement = "outside",
        strip.background = element_blank())

Notes:

  1. Since the variables can have different scales, we use scale = "free" in facet_grid to allow each axis to scale freely.

  2. switch = "both" changes the facet strip labels to the other side

  3. labeller allows us to supply a named vector and change the strip labels as desired

  4. strip.placement = "outside" sets the strip labels outside of the axis ticks, while strip.background = element_blank() removes the grey strip label background (inspired by this answer by aosmith)

  5. labs(..., x = NULL, y = NULL) removes the default axis labels, effectively treating the modified facet strip labels as axis labels

Output:

> data %>%
+   gather(var1, value1, outcome1:outcome2) %>%
+   gather(var2, value2, response1:response2)

   code     var1 value1      var2 value2
1     1 outcome1     75 response1   1500
2     2 outcome1     76 response1   1800
3     3 outcome1     77 response1   1789
4     4 outcome1     78 response1   1200
5     5 outcome1     80 response1   1400
6     6 outcome1     82 response1   1900
7     7 outcome1     85 response1   1800
8     8 outcome1     84 response1   1100
9     9 outcome1     78 response1   1450
10   10 outcome1     84 response1   1750
11   11 outcome1     84 response1   1770
12   12 outcome1     75 response1   1000
13    1 outcome2     50 response1   1500
14    2 outcome2     55 response1   1800
...

enter image description here

acylam
  • 18,231
  • 5
  • 36
  • 45
  • thanks! I forgot to mention that the various outcomes and responses are in different units, so faceting may not work here. As for a given Outcome1, each Response (Response1, Response2 etc) is measured in different in units (grams, cups, %, etc). – DiscoR Sep 26 '18 at 18:02
  • Does it make sense to scale or standardize the values for your outcomes and responses? – Mitch Sep 26 '18 at 18:04
  • @Mitch I am not sure of this. 4 outcome responses are in the same unit (mm), one is in (cms), kg, others are indices (for example: BMI). Response variables are in ounces, and some are in cup and teaspoon units. – DiscoR Sep 26 '18 at 18:08
  • @DiscoR Notice I have used `scale = "free"`, which allows each plot to have different scales. Would it help if each of them have a separate axis label to indicate the scales? – acylam Sep 26 '18 at 18:11
  • @avid_useR thanks! I just printed this graph and I notice two things. a) it works but separate axis labels would help, b) my graph grid is very large (21*22) cells. Is there anyway to break the graph? Should I just manually create two separate graphs by selecting different variables? – DiscoR Sep 26 '18 at 18:15
  • @DiscoR You can use the `rows` and `cols` argument in `facet_grid` to customize your grid layout. If it's still too big, then it's probably a better idea to split it into two plots. For the separate axis labels, see my update. – acylam Sep 26 '18 at 18:22
  • @DiscoR: if you want to break your plot into multiple pages, see [this answer](https://stackoverflow.com/a/50930640/786542) – Tung Sep 26 '18 at 19:04