1

I have this dataframe

A <- c(100,101,102,98,97,93,96)
B <- c("John","Anne","John", "Anne","John","Anne","John")
C <- c("cheap", "cheap", "expensive", "cheap", "expensive", "cheap", "expensive")
D <- c("USA", "Mexico", "Mexico","USA", "Mexico","USA", "Mexico")

dataframe <- data.frame(A, B, C, D)

   A    B         C      D
1 100 John     cheap    USA
2 101 Anne     cheap Mexico
3 102 John expensive Mexico
4  98 Anne     cheap    USA
5  97 John expensive Mexico
6  93 Anne     cheap    USA
7  96 John expensive Mexico

Imagine that I want to create on the same plot, different boxplots, grouping B, C and D columns.

So a total of 6 boxplots (John, Anne, cheap, expensive, USA and Mexico). Taking into account the the group A values, of course.

The problem here is that each subgroup have a different total of samples to plot, which makes me very confused.

1 Answers1

1

This problem is a matter of reshaping the data to long format. Then it becomes a standard boxplot.

library(ggplot2)
library(magrittr)
library(tidyr)

dataframe %>%
  pivot_longer(-A) %>%
  ggplot(aes(value, A)) +
  geom_boxplot()

enter image description here

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • Thank you! Do you know how can I arrange the order of these grouped variables in the plot? – João Machado Feb 02 '22 at 17:06
  • 1
    @JoãoMachado You can corece to factor and reorder the factor levels. Right after `pivot_longer`, include `mutate(value = factor(value, levels = c("John", "Anne", "cheap", "expensive", "USA", "Mexico")))`. – Rui Barradas Feb 02 '22 at 18:06