4

From a mouse experiment I have data for about fifty mice coming for about 15 different metrics. I generated a list of correlation plots of every metric against every other metric to identify which measurements correlate with each other and which ones don't.

library(ggplot2)
df <- structure(list(mouse_ID = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 
10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 22L, 23L, 24L, 25L, 
26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 
39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 51L, 
52L, 53L, 54L, 55L), treatment = structure(c(1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L
), .Label = c("not challenged", "vehicle control", "high", 
"medium", "low", "reference"
), class = "factor"), value.x = c(0.003725, 0.0208, 0.004475, 
0, 0.00895, 1.00625, 1.0125, 1.014, 1.1025, 0.925, 0.897, 0.99, 
1.1495, 1.0125, 1.08, 0.88425, 1.001, 0.864, 0.89175, 0.9425, 
0.943, 1.07325, 0.73575, 0.606, 0.682, 0.79925, 0.87, 0.60225, 
0.756, 0.891, 0.6555, 0.572, 0.253, 0.255, 0.396, 0.4495, 0.299, 
0.39, 0.3, 0.5365, 0.378, 0.475, 0.73575, 0.4895, 0.468, 0.90625, 
0.3905, 0.4995, 0.60375, 0.744, 0.75, 0.5535), value.y = c(0, 
0, 0, 0, 0, 5.775, 4.6875, 4.992, 7.245, 6.0125, 3.795, 4.99125, 
7.26275, 4.35375, 4.3875, 3.6025, 4.389, 3.852, 3.444, 4.205, 
5.207, 4.77, 3.052, 2.65125, 2.024, 3.6835, 2.9, 1.5695, 2.7, 
2.619, 2.964, 1.936, 0.539, 0.408, 1.056, 1.085, 0.897, 0.795, 
0.5, 1.0915, 0.5355, 0.575, 2.8885, 2.0915, 1.755, 3.40625, 1.42, 
1.6095, 2.835, 2.3715, 2.7, 1.927)), row.names = c(NA, -52L), 
class = c("tbl_df", "tbl", "data.frame"))

ggplot(data = df, aes(x = value.x, y = value.y)) +
    geom_point(aes(color = treatment)) +
    geom_smooth(method = lm, se = TRUE)
#> `geom_smooth()` using formula 'y ~ x'

It turns out that a long list of over 100 plots is really hard to take in, and on each plot there is relatively little information. I would like to arrange these linear plots in a grid of the 15 x 15 measurements and visualize the correlation coefficient for the linear models by background color and overlay the linear model and data points.

Is this somehow feasible to do in ggplot? Is there another tool I could use? And if so, how should I arrange the data structure? I am comfortable dealing with purrr and nested lists for such models, but I guess in this case a long list does not seem ideal -- a matrix-style arrangement would fit the output much better.

Any thoughts or suggestions on how to approach this?

Created on 2021-01-20 by the reprex package (v0.3.0)

Sorry, my explanation wasn’t clear. The data I am showing above is only a fraction of the data available. Here I am plotting the linear correlation of two read outs. But I have over a dozen read outs that I used for pair wise comparisons. I am looking for something like this:

Each tile should be colored by a metric of the linear model (eg correlation coefficient or p value) but it should also show the graphed data and overlay of the linear model.

Z.Lin
  • 28,055
  • 6
  • 54
  • 94
Mario Niepel
  • 1,095
  • 4
  • 19
  • 3
    You may find the suggestions here relevant: https://www.r-graph-gallery.com/199-correlation-matrix-with-ggally.html – Z.Lin Jan 21 '21 at 04:32
  • 2
    I second `GGally::ggpairs` – M.Viking Jan 21 '21 at 04:56
  • Does this answer your question? [Create a matrix of scatterplots (pairs() equivalent) in ggplot2](https://stackoverflow.com/questions/3735286/create-a-matrix-of-scatterplots-pairs-equivalent-in-ggplot2) – tjebo Jan 24 '21 at 11:58

2 Answers2

2

GGally is absolutely what I was looking for. It's simply to use and has a number of useful plotting options I will need to explore.

enter image description here

It turns out there are potentially some issues when the grid gets larger, bit right now it's not clear to me if this is a data issue or a limitation in the plotting function. Lot's of stuff to explore, but the simplicity of getting the first plots done is awesome.

Now to figure out how to scale the background color of each mini-plot by the overall correlation coefficient!

enter image description here

enter image description here

Mario Niepel
  • 1,095
  • 4
  • 19
  • A similar function that wraps around `pairs()` from base R is `chart.Correlation()` from the `PrecisionAnalytics` package. – Ben Norris Jan 21 '21 at 15:08
0

Are you looking for faceting?

library(ggplot2)

ggplot(df, aes(x = value.x, y = value.y)) +
  geom_point(aes(color = treatment)) + 
  geom_smooth(method = "lm", se = TRUE) +
  facet_wrap(~treatment, labeller = label_both)

enter image description here

If you want to compare combinations of grouping variables, try facet_grid. I'm using the builtin mtcars data for this example, since your sample data only has one categorical variable.

ggplot(mtcars, aes(x = hp, y = mpg)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  facet_grid(cyl ~ am, labeller = label_both)

enter image description here

Ben Norris
  • 5,639
  • 2
  • 6
  • 15
  • Sorry. I wasn’t clear with my question. I tried to amend it to clarify what I am looking to create. – Mario Niepel Jan 21 '21 at 03:47
  • I guess the [corrplot function](https://www.mathworks.com/help/econ/corrplot.html) in MatLab describes what I’m looking for. Even though On top of the graph I think the background color should reflect the strength of the correlation. – Mario Niepel Jan 21 '21 at 03:58