0

I have a dataset (1000 IDs, 9 classes) similar to this one:

ID     Class     Value
1      A         0.014
1      B         0.665
1      C         0.321
2      A         0.234
2      B         0.424
2      C         0.342
...    ...       ...

The Value column are (relative) abundances, i.e. the sum of all classes for one individual equals 1.

I would like to create a ggplot geom_bar plot in R where the x axis is not ordered by IDs but by decreasing class abundance, similar to this one:

enter image description here

In our example, let's say that Class B is the most abundant class across all individuals, followed by Class C and finally Class A, the first bar of the x axis would be for the individual with the highest Class B, the second bar would the individual with the second highest Class B, etc.

This is what I tried:

ggplot(df, aes(x=ID, y=Value, fill=Class)) +
  geom_bar(stat="identity") +
  xlab("") +
  ylab("Relative Abundance\n")
Svalf
  • 151
  • 1
  • 9
  • 1
    You might find a hint here: https://stackoverflow.com/questions/25664007/reorder-bars-in-geom-bar-ggplot2 – Wolfgang Arnold Sep 25 '18 at 08:57
  • 1
    Possible duplicate of [Reorder bars in geom\_bar ggplot2](https://stackoverflow.com/questions/25664007/reorder-bars-in-geom-bar-ggplot2) – LAP Sep 25 '18 at 09:02
  • Thank you, I saw this post before but it takes into account only the values, and not the classes and I would like to manually sort the classes in this order: B > C > A. – Svalf Sep 25 '18 at 09:04

1 Answers1

1

You can do the reordering before passing the result to ggplot():

library(dplyr)
library(ggplot2)

# sum the abundance for each class, across all IDs, & sort the result
sort.class <- df %>% 
  count(Class, wt = Value) %>%
  arrange(desc(n)) %>%
  pull(Class)

# get ID order, sorted by each ID's abundance in the most abundant class
ID.order <- df %>%
  filter(Class == sort.class[1]) %>%
  arrange(desc(Value)) %>%
  pull(ID)

# factor ID / Class in the desired order
df %>%
  mutate(ID = factor(ID, levels = ID.order)) %>%
  mutate(Class = factor(Class, levels = rev(sort.class))) %>%
  ggplot(aes(x = ID, y = Value, fill = Class)) +
  geom_col(width = 1) #geom_col is equivalent to geom_bar(stat = "identity")

plot

Sample data:

library(tidyr)

set.seed(1234)
df <- data.frame(
  ID = seq(1, 100),
  A = sample(seq(2, 3), 100, replace = TRUE),
  B = sample(seq(5, 9), 100, replace = TRUE),
  C = sample(seq(3, 7), 100, replace = TRUE),
  D = sample(seq(1, 2), 100, replace = TRUE)
) %>%
  gather(Class, Value, -ID) %>%
  group_by(ID) %>%
  mutate(Value = Value / sum(Value)) %>%
  ungroup() %>% 
  arrange(ID, Class)

> df
# A tibble: 400 x 3
      ID Class  Value
   <int> <chr>  <dbl>
 1     1 A     0.143 
 2     1 B     0.357 
 3     1 C     0.429 
 4     1 D     0.0714
 5     2 A     0.176 
 6     2 B     0.412 
 7     2 C     0.294 
 8     2 D     0.118 
 9     3 A     0.2   
10     3 B     0.4   
# ... with 390 more rows
Z.Lin
  • 28,055
  • 6
  • 54
  • 94