2

Data overview: 27 sample sites (bars), 3 phyla/groups shown by color, and a variable number of unique species (grouped into 1 of the three phyla) at each site.

I am trying to put a number on top of each stacked column showing the number of unique species detected at each sample site (bar). Overall goal is to show that from left to right the diversity (#of species) goes down. I have the numbers stored in a vector, I tried merging to the original d.f. using a column "count" and then doing: geom_text(aes(label=count)) which resulted in the following image, where the count of species/area is duplicated in each bar. Help!

enter image description here

test data that can be used:

set.seed(123)
test.data <- data.frame(area = rep(1:27, 3), 
                        cover = c(rnorm(27, mean = 40),
                                   rnorm(27, mean = 10),
                                   rnorm(27, mean = 5)),
                       phyla = c(rep("c", 27), rep("o", 27), rep("r", 27)))
 
    ggplot(data=test.data, aes(x=area, y = cover,
                               fill = phyla))+
      geom_col() +
      theme_classic()+
      scale_y_continuous(expand = expansion(mult = c(0, 0.05)),
                         limits=c(0,75))+
      scale_fill_manual(
        values =c('#a6611a','#dfc27d', '#80cdc1'))+
      geom_vline(xintercept=c(9.5,18.5, 27.5), linetype="dotted") 
      
    #number I want on top of each bar
    count = c(14, 17, 15, 15, 15, 12, 18, 14, 13, 14, 12, 14, 12, 11, 13,
              9, 11, 10,  5,  4,  4,  6,  7,  6,  6,  6,  5)
j-asano
  • 35
  • 4
  • For [making a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), you should provide your data via the output of `dput(df)` or to give a minimal subset use `dput(head(df))`. – AndrewGB Feb 21 '22 at 22:05
  • It's not clear what format an answer could take here since you haven't shared your code or data. It seems fairly clear what the problem is, but attempting to give you an answer would mean guessing what your variables are called and what the structure of your plot call is, and this is likely to cause frustration and requests for clarifications in the comments. Please help us to help you by making your question reproducible.. – Allan Cameron Feb 21 '22 at 22:08
  • Thank you! I am still pretty new here (and to R in general). I added in a code block that will hopefully suffice in doing a reproducible example! – j-asano Feb 21 '22 at 22:20

1 Answers1

3

Your stacked bars include 3 groups per area, but you want just one count label per area. As you found, this can be tricky, as ggplot will try to plot a value for each row in data. You can get around this by passing separate data to geom_col (your original test.data, with 27 rows) and to geom_text (for which we'll create a new data.frame with 13 rows, one per bar).

We'll also need to know the total height of each bar in order to place each label atop it. We can find this by using dplyr::group_by() and summarize() to compute the sum of y values (cover) within each x value (area).

library(tidyverse)

# I'm assuming the `count`s in your question are already in the right order
count.data <- test.data %>% 
  group_by(area) %>% 
  summarize(label_height = sum(cover)) %>% 
  mutate(count = c(14, 17, 15, 15, 15, 12, 18, 14, 13, 14, 12, 14, 12, 11, 13,
                   9, 11, 10,  5,  4,  4,  6,  7,  6,  6,  6,  5)
  )

ggplot() +
  geom_col(data = test.data, aes(x = area, y = cover, fill = phyla)) +
  geom_text(
    data = count.data, 
    aes(x = area, y = label_height, label = count),
    nudge_y = 2              # scoot labels up above bar tops
  ) +
  theme_classic() +
  scale_y_continuous(
    expand = expansion(mult = c(0, 0.05)),
    limits = c(0, 75)
  ) +
  scale_fill_manual(values = c('#a6611a', '#dfc27d', '#80cdc1')) +
  geom_vline(xintercept = c(9.5, 18.5, 27.5), linetype = "dotted") 
zephryl
  • 14,633
  • 3
  • 11
  • 30
  • This was great, I missed the bit where you stated that I have to enter the information into geom_col() instead of ggplot() so got stuck there for a while, but figured it out. Thanks so much! – j-asano Feb 22 '22 at 00:33