0

Using ggplot and facet_grid, I'd like to visualize two parallel vector of values through a box plot. My available data:

DF <- data.frame("value" =  runif(50, 0, 1),
             "value2" = runif(50,0,1),
             "type1" = c(rep("AAAAAAAAAAAAAAAAAAAAAA", 25), 
                         rep("BBBBBBBBBBBBBBBBB", 25)),
             "type2" = rep(c("c", "d"), 25), 
             "number" = rep(2:6, 10))

The code at the moment permit to visualize only one vector of values:

ggplot(DF, aes(y=value, x=type1)) + 
  geom_boxplot(alpha=.3, aes(fill = type1)) + 
  ggtitle("TITLE") + 
  facet_grid(type2 ~ number) +
  scale_x_discrete(name = NULL, breaks = NULL) + # these lines are optional
  theme(legend.position = "bottom")

This is my plot at the moment.

enter image description here

I'd like to visualize a parallel box plot one for each vector (value and value2 in dataframe). Then for each colored boxplot, I'd like to have two boxplot one for value and another one for value2

TheAvenger
  • 458
  • 1
  • 6
  • 19
  • 1
    Can you be more clear about what you're trying to get? It seems like a classic problem of needing to reshape data into a [long format](http://www.cookbook-r.com/Manipulating_data/Converting_data_between_wide_and_long_format/) to fit within `ggplot`'s paradigm. – camille Oct 02 '18 at 15:12
  • It's also worth noting that if you can identify lines as being optional, you'd be better off removing them from the question in order to fit the SO guidelines of keeping questions [minimal](https://stackoverflow.com/help/mcve) – camille Oct 02 '18 at 15:13
  • I'd like to have two parallel box plot for each configuration, value and value2 are two indices I'd like to match. Probably I can solve the problem with melt function or reshape, but I hoped that with ggplot I can do easyly do it with a condition. – TheAvenger Oct 02 '18 at 15:37
  • No, `ggplot` expects the data to already be formatted properly. It's a separation of concerns for `tidyr` / `reshape2` functions to reshape the data, then `ggplot2` to plot it – camille Oct 02 '18 at 15:41
  • I'm not marking as a duplicate, since the accepted answer there is quite outdated: https://stackoverflow.com/q/3777174/5325862 – camille Oct 02 '18 at 15:50
  • something like that, but now the box plot are overlapped. Like this image https://drive.google.com/file/d/1k5Q5Stb2N2qnktmgP2g6GfCQyKH7XLAc/view?usp=sharing – TheAvenger Oct 02 '18 at 16:00

2 Answers2

1

I think there's likely a post that already addresses it, in addition to the one I linked to above. But this is a problem of two things: 1) getting data into the format that ggplot expects, i.e. long-shaped so there are values to map onto aesthetics, and 2) separation of concerns, in that you can use reshape2 or (more up-to-date) tidyr functions to get data into the proper shape, and ggplot2 functions to plot it.

You can use tidyr::gather for getting long data, and conveniently pipe it directly into ggplot.

library(tidyverse)
...

To illustrate, though with very generic column names:

DF %>%
  gather(key, value = val, value, value2) %>%
  head()
#>                    type1 type2 number   key       val
#> 1 AAAAAAAAAAAAAAAAAAAAAA     c      2 value 0.5075600
#> 2 AAAAAAAAAAAAAAAAAAAAAA     d      3 value 0.6472347
#> 3 AAAAAAAAAAAAAAAAAAAAAA     c      4 value 0.7543778
#> 4 AAAAAAAAAAAAAAAAAAAAAA     d      5 value 0.7215786
#> 5 AAAAAAAAAAAAAAAAAAAAAA     c      6 value 0.1529630
#> 6 AAAAAAAAAAAAAAAAAAAAAA     d      2 value 0.8779413

Pipe that directly into ggplot:

DF %>%
  gather(key, value = val, value, value2) %>%
  ggplot(aes(x = key, y = val, fill = type1)) +
    geom_boxplot() +
    facet_grid(type2 ~ number) +
    theme(legend.position = "bottom")

Again, because of some of the generic column names, I'm not entirely sure this is the setup you want—like I don't know the difference in value / value2 vs AAAAAAA / BBBBBBB. You might need to swap aes assignments around accordingly.

camille
  • 16,432
  • 18
  • 38
  • 60
  • I have done the same thing using ggplot: ggplot(DF, aes(x=type1)) + geom_boxplot(alpha=.3, aes(x="value",y= value,fill = type1)) + geom_boxplot(alpha=.3, aes(x="value2",y= value2,fill = type1)) + ggtitle("TITLE") + facet_grid(type2 ~ number) + scale_x_discrete(name = NULL, breaks = NULL) + # these lines are optional theme(legend.position = "bottom") But I'd like to empathize the two group with different shape maybe – TheAvenger Oct 02 '18 at 16:24
  • It's only in special cases that you'd want to supply hard-coded values to an `aes` call like you're doing there. The setup of `ggplot` really is to map those types of variables along aesthetics. Maybe check out some `ggplot` tutorials, such as the R Cookbook I linked to above – camille Oct 02 '18 at 17:13
0

You have to reshape your data frame. Use an additionally indicator (column) which defines the type of value (for example "value_type") and only define one value column. The indicator will than match the value to the corresponding value type. The following code will reshape your example:

DF <- data.frame("value" =  c(runif(50, 0, 1), runif(50,0,1)),
                 "value_type" = rep(c("value1","value2"), each=50),
                 "type1" = rep(c(rep("AAAAAAAAAAAAAAAAAAAAAA", 25), 
                                 rep("BBBBBBBBBBBBBBBBB", 25)), 2),
                 "type2" = rep(rep(c("c", "d"), 25), 2), 
                 "number" = rep(rep(2:6, 10),2))

Use ggplot additionaly with an color argument:

ggplot(DF, aes(y=value, x=type1, col=value_type)) + 
  geom_boxplot(alpha=.3, aes(fill = type1)) + 
  ggtitle("TITLE") + 
  facet_grid(type2 ~ number) +
  scale_color_manual(values=c("green", "steelblue")) + # set the color of the values manualy
  scale_x_discrete(name = NULL, breaks = NULL) +# these lines are optional
  theme(legend.position = "bottom")
Freakazoid
  • 490
  • 3
  • 10
  • It probably isn't feasible for the OP to just retype their data like this, though. You want to help them write the code to do that programmatically. – camille Oct 02 '18 at 15:40