1

For a sample dataframe:

   df <- structure(list(year = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L), letter_group = c("A", "A", "A", "B", "B", "B", "C", 
"C", "C", "C", "A", "A", "A", "B", "B", "B", "C", "C", "C", "C", 
"A", "A", "A", "B", "B", "B", "C", "C", "C", "C", "C", "C", "C", 
"A", "A", "A", "B", "B", "B", "C", "C", "C", "C", "C"), value = c(2L, 
3L, 4L, 5L, 6L, 6L, 7L, 8L, 5L, 6L, 7L, 3L, 4L, 5L, 6L, 4L, 5L, 
6L, 2L, 3L, 4L, 4L, 5L, 6L, 7L, 8L, 5L, 3L, 2L, 4L, 5L, 6L, 4L, 
3L, 4L, 5L, 6L, 7L, 1L, 2L, 4L, 5L, 6L, 4L)), .Names = c("year", 
"letter_group", "value"), row.names = c(NA, -44L), class = c("tbl_df", 
"tbl", "data.frame"), spec = structure(list(cols = structure(list(
    year = structure(list(), class = c("collector_integer", "collector"
    )), letter_group = structure(list(), class = c("collector_character", 
    "collector")), value = structure(list(), class = c("collector_integer", 
    "collector"))), .Names = c("year", "letter_group", "value"
)), default = structure(list(), class = c("collector_guess", 
"collector"))), .Names = c("cols", "default"), class = "col_spec"))

I am trying to create a box plot which comprises the years on the x axes - but also the 'letter-groups' grouped by year...

i.e. A, B, C for year 1, then a small space then A, B C for year 2 and so on....

I have the following:

library(ggplot2)

p1 <- ggplot(df, aes(year, value))
p1 + geom_boxplot(aes(group=letter_group))

But this is only producing the 3 box plots.

Could someone please help me?

Dan
  • 11,370
  • 4
  • 43
  • 68
KT_1
  • 8,194
  • 15
  • 56
  • 68
  • 2
    your grouping variables do not seem to be factors. is ggplot(df, aes(as.factor(year), value, fill=as.factor(letter_group))) + geom_boxplot() what you are looking for? – nouse May 23 '19 at 11:24
  • Thanks @nouse - simple and effective! Perfect! In my real example, I have ten 'letter-groups' - is there a way to specify the order of my variables? (I have 1-10, yet ggplot is ordering them 1, 10, 2-9) – KT_1 May 23 '19 at 12:52
  • @KT_1 Do you mean you have letters running A to J or that your "letter" groups are the numbers 1-10? Try `factor(letter_group, levels = LETTERS[1:10])` or `factor(letter_group, levels = 1:10)`. – Dan May 23 '19 at 13:02
  • Thanks @Lyngbakr - my 'letter_group' is infact called 'deciles' which run 1-10. The second of your helpful suggestions gives the error: Error in as.factor(deciles, levels = 1:10) : unused argument (levels = 1:10) What am I doing wrong? – KT_1 May 23 '19 at 13:24
  • 1
    @KT_1 Note that it's `factor` not `as.factor`. (See [here](https://stackoverflow.com/a/39279275/1552004) for an explanation of the differences.) – Dan May 23 '19 at 13:26

3 Answers3

3

An alternative to @nouse's solution (which is the best solution) is to use faceting. One benefit of faceting, however, is that you also get letter group labels on the x-axis.

Define data structure

# Load library
library(ggplot2)

# Define data frame
df <- structure(list(year = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                              2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 
                              3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
                              4L, 4L), letter_group = c("A", "A", "A", "B", "B", "B", "C", 
                                                        "C", "C", "C", "A", "A", "A", "B", "B", "B", "C", "C", "C", "C", 
                                                        "A", "A", "A", "B", "B", "B", "C", "C", "C", "C", "C", "C", "C", 
                                                        "A", "A", "A", "B", "B", "B", "C", "C", "C", "C", "C"), 
                     value = c(2L, 3L, 4L, 5L, 6L, 6L, 7L, 8L, 5L, 6L, 7L, 3L, 4L, 5L, 6L, 4L, 5L, 
                               6L, 2L, 3L, 4L, 4L, 5L, 6L, 7L, 8L, 5L, 3L, 2L, 4L, 5L, 6L, 4L, 
                               3L, 4L, 5L, 6L, 7L, 1L, 2L, 4L, 5L, 6L, 4L)), 
                .Names = c("year", "letter_group", "value"), 
                row.names = c(NA, -44L), 
                class = c("tbl_df","tbl", "data.frame"), 
                spec = structure(list(cols = structure(list( ear = structure(list(), class = c("collector_integer", "collector")), 
                                                             letter_group = structure(list(), class = c("collector_character", "collector")), 
                                                             value = structure(list(), class = c("collector_integer",  "collector"))), 
                                                       .Names = c("year", "letter_group", "value")), 
                                      default = structure(list(), class = c("collector_guess", "collector"))), 
                                 .Names = c("cols", "default"), class = "col_spec"))

Plot results

# Plot results
g <- ggplot(df)
g <- g + geom_boxplot(aes(letter_group, value))
g <- g + facet_grid(. ~ year, switch = "x")
g <- g + theme(strip.placement = "outside",
               strip.background = element_blank(),
               panel.background = element_rect(fill = "white"),
               panel.grid.major = element_line(colour = alpha("gray50", 0.25), linetype = "dashed"))
g <- g + ylab("Value") + xlab("Year & Letter Group")
print(g)

Created on 2019-05-23 by the reprex package (v0.2.1)

Dan
  • 11,370
  • 4
  • 43
  • 68
1

Your question has been largely answered here.

Your dataframe does not include factors, so you would first need to turn your grouping variables into factors. Then, there are two options, as per link given above. Either construct a new factor by combining your two original factors (as shown in z-cool's answer) - but this does not create the desired space between factor levels on the x-axis - or you would need to assign one of your factors to fill, or col. In your case, the quickest way to solve your problem is

ggplot(df, aes(as.factor(year), value, fill=as.factor(letter_group))) + geom_boxplot()

If you do not want to colorize your plot, you can change this with scale_fill_manual or scale_color_manual, depending on your choice in aes before:

ggplot(df, aes(as.factor(year), value, fill=as.factor(letter_group))) + geom_boxplot() +
  scale_fill_manual(values=c("white", "white", "white")) +
  theme(legend.position = "none")
nouse
  • 3,315
  • 2
  • 29
  • 56
-1

This should work

library(tidyverse)
df %>% 
  mutate(year_group = paste(year, letter_group)) %>% 
  ggplot(aes(year_group, value)) +
  geom_boxplot()
z-cool
  • 334
  • 1
  • 9