0

We are comparing two datasets (proteome and transcriptome) with matching groups, except for one group missing in the proteome data. As I plot the graphs underneath each other, they would be much easier to compare if the missing group would be represented in the plot.

For this, I need to find a way to plot an "empty" group along the x-axis in a boxplot of ggplot. I could not find a good example for this here or online (though similar questions have been asked) but I didnt get any solution to work for me as my groups are characters and I get the error: Error in factor("a", "b", "c", "d", "e"): unused argument ("e")

example data

I guess there would be a easier way creating this, but it works. Happy for new solution!

library(tidyverse)

   crete_exp_df <- function(gene_nr, sample_nr){
  df <- replicate(sample_nr, rnorm(gene_nr))
  df <- as.data.frame(df)
  colnames(df) <- paste("Sample", c(1:ncol(df)))
  rownames(df) <- paste("Gene", c(1:nrow(df)))
  return(df)
}

df1 <- crete_exp_df(5, 20)

# sample annotation

san <- data.frame(
  id = colnames(df1),
  group = sample(letters[1:4], 20, replace = TRUE))


df1$gene <- rownames(df1)

# long exp4 box ----
df_long <- df1 %>% pivot_longer(!gene, names_to = "id", values_to = "value")
df_long$group <- as.factor(san$group[match(df_long$id, san$id)])

# plot
df_long %>%
  filter(gene == "Gene 1") %>%
  ggplot(aes(x = group, y = value)) +
  geom_boxplot()

expected result

enter image description here

Sebastian Hesse
  • 542
  • 4
  • 16

1 Answers1

1

You could add an extra level when creating your factor column. So you could use scale_x_discrete with the breaks you want to show and make sure you use drop=FALSE to not remove the empty group like this:

library(tidyverse)

# expression dataframe
crete_exp_df <- function(gene_nr, sample_nr){
  df <- replicate(sample_nr, rnorm(gene_nr))
  df <- as.data.frame(df)
  colnames(df) <- paste("Sample", c(1:ncol(df)))
  rownames(df) <- paste("Gene", c(1:nrow(df)))
  return(df)
}

df1 <- crete_exp_df(5, 20)

# sample annotation

san <- data.frame(
  id = colnames(df1),
  group = sample(1:4, 20, replace = TRUE))


df1$gene <- rownames(df1)

# long exp4 box ----
df_long <- df1 %>% pivot_longer(!gene, names_to = "id", values_to = "value")
df_long$group <- factor(san$group[match(df_long$id, san$id)], levels = 1:5)

# plot
df_long %>%
  filter(gene == "Gene 1") %>%
  ggplot(aes(x = group, y = value)) +
  geom_boxplot() +
  scale_x_discrete("group", breaks=factor(1:5), drop=FALSE)

If you want to have the same order as your expected output, you could change the order of the breaks by change the order of the levels of your factor like this:

# long exp4 box ----
df_long <- df1 %>% pivot_longer(!gene, names_to = "id", values_to = "value")
df_long$group <- factor(san$group[match(df_long$id, san$id)], levels = c(1,2,3,5,4))

# plot
df_long %>%
  filter(gene == "Gene 1") %>%
  ggplot(aes(x = group, y = value)) +
  geom_boxplot() +
  scale_x_discrete("group", breaks=factor(1:5), drop=FALSE)

Created on 2023-04-27 with reprex v2.0.2

Quinten
  • 35,235
  • 5
  • 20
  • 53
  • I will need to rephrase the question as my groups are characters and it seems not to be transfereable. I will redo the example code. I get: Error in factor("MB", "PM", "MC", "MM", "B", "S", "PMN") [= my groups]: unused argument ("PMN") [my data contains no S] – Sebastian Hesse Apr 27 '23 at 08:22