3

I want to put labels of the percentages on my stacked bar plot. However, I only want to label the largest 3 percentages for each bar. I went through a lot of helpful posts on SO (for example: 1, 2, 3), and here is what I've accomplished so far:

library(ggplot2)
groups<-factor(rep(c("1","2","3","4","5","6","Missing"),4))
site<-c(rep("Site1",7),rep("Site2",7),rep("Site3",7),rep("Site4",7))
counts<-c(7554,6982, 6296,16152,6416,2301,0,
          20704,10385,22041,27596,4648, 1325,0,
          17200, 11950,11836,12303, 2817,911,1,
          2580,2620,2828,2839,507,152,2)
tapply(counts,site,sum)
tot<-c(rep(45701,7),rep(86699,7), rep(57018,7), rep(11528,7))
prop<-sprintf("%.1f%%", counts/tot*100)

data<-data.frame(groups,site,counts,prop)

ggplot(data, aes(x=site, y=counts,fill=groups)) + geom_bar()+
  stat_bin(geom = "text",aes(y=counts,label = prop),vjust = 1) +
  scale_y_continuous(labels = percent)

I wanted to insert my output image here but don't seem to have enough reputation...But the code above should be able to produce the plot.

So how can I only label the largest 3 percentages on each bar? Also, for the legend, is it possible for me to change the order of the categories? For example put "Missing" at the first. This is not a big issue here but for my real data set, the order of the categories in the legend really bothers me.

I'm new on this site, so if there's anything that's not clear about my question, please let me know and I will fix it. I appreciate any answer/comments! Thank you!

Community
  • 1
  • 1
Vivian
  • 33
  • 1
  • 3

1 Answers1

1

I did this in a sort of hacky manner. It isn't that elegant.

Anyways, I used the plyr package, since the split-apply-combine strategy seemed to be the way to go here.

I recreated your data frame with a variable perc that represents the percentage for each site. Then, for each site, I just kept the 3 largest values for prop and replaced the rest with "".

# I added some variables, and added stringsAsFactors=FALSE
data <- data.frame(groups, site, counts, tot, perc=counts/tot,
                   prop, stringsAsFactors=FALSE)

# Load plyr
library(plyr)
# Split on the site variable, and keep all the other variables (is there an
# option to keep all variables in the final result?)
data2 <- ddply(data, ~site, summarize, 
               groups=groups,
               counts=counts, 
               perc=perc,
               prop=ifelse(perc %in% sort(perc, decreasing=TRUE)[1:3], prop, ""))

# I changed some of the plotting parameters
ggplot(data2, aes(x=site, y=perc, fill=groups)) + geom_bar()+
  stat_bin(geom = "text", aes(y=perc, label = prop),vjust = 1) +
  scale_y_continuous(labels = percent)

enter image description here

EDIT: Looks like your scales are wrong in your original plotting code. It gave me results with 7500000% on the y axis, which seemed a little off to me...

EDIT: I fixed up the code.

ialm
  • 8,510
  • 4
  • 36
  • 48
  • When reproducing this today one needs to put `scales::` before `percentages` and `plyr::` before `summarize`. – peer Oct 06 '19 at 21:30