4

I am new to ggplot2 (and R) and am trying to make a filled bar chart with labels in each box indicating the percentage composing that block.

Here is an example of my current figure to which I would like to add labels:

##ggplot figure 
library(gpplot2)
library(scales) 

#specify order I want in plots
ZIU$Affinity=factor(ZIU$Affinity, levels=c("High", "Het", "Low"))
ZIU$Group=factor(ZIU$Group, levels=c("ZUM", "ZUF", "ZIM", "ZIF"))

ggplot(ZIU, aes(x=Group))+
geom_bar(aes(fill=Affinity), position="fill", width=1, color="black")+
scale_y_continuous(labels=percent_format())+
scale_fill_manual("Affinity", values=c("High"="blue", "Het"="lightblue", "Low"="gray"))+
labs(x="Group", y="Percent Genotype within Group")+
ggtitle("Genotype Distribution", "by Group")

I would like to add labels centered in each box with the percentage that box represents

I have tried to add labels using this code, but it keeps producing the error message "Error: geom_text requires the following missing aesthetics: y" but my plot has no y aesthetic, does this mean I cannot use geom_text? (Also, I am not sure if once the y aesthetic issue is resolved, if the remainder of the geom_text statement will accomplish what I desire, centered white labels in each box.)

ggplot(ZIU, aes(x=Group)) +
geom_bar(aes(fill=Affinity), position="fill", width=1, color="black")+
geom_text(aes(label=paste0(sprintf("%.0f", ZIU$Affinity),"%")),
    position=position_fill(vjust=0.5), color="white")+
scale_y_continuous(labels=percent_format())+
scale_fill_manual("Affinity", values=c("High"="blue", "Het"="lightblue", "Low"="gray"))+
labs(x="Group", y="Percent Genotype within Group")+
ggtitle("Genotype Distribution", "by Group")

Also if anyone has suggestions for eliminating the NA values that would be appreciated! I tried

geom_bar(aes(fill=na.omit(Affinity)), position="fill", width=1, color="black")

but was getting the error "Error: Aesthetics must be either length 1 or the same as the data (403): fill, x"

 dput(sample)
 structure(list(Group = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 
 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
 2L), .Label = c("ZUM", "ZUF", "ZIM", "ZIF"), class = "factor"), 
StudyCode = c(1, 2, 3, 4, 5, 6, 20, 21, 22, 23, 143, 144, 
145, 191, 192, 193, 194, 195, 196, 197, 10, 24, 25, 26, 27, 
28, 71, 72, 73, 74, 274, 275, 276, 277, 278, 279, 280, 290, 
291, 292), Affinity = structure(c(3L, 2L, 1L, 2L, 3L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 3L, 2L, 3L, 1L, 1L, 1L, 3L, 
2L, 1L, 2L, 2L, 1L, 2L, 2L, 3L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 
3L, 2L, 2L, 2L), .Label = c("High", "Het", "Low"), class = "factor")), .Names = c("Group", 
"StudyCode", "Affinity"), row.names = c(NA, 40L), class = c("tbl_df", 
"tbl", "data.frame"))

Thank you so much!

Sarah
  • 63
  • 1
  • 4
  • 1
    There are several questions on SO about centering box labels. For example, [here](http://stackoverflow.com/a/34904604/496488) and [here](http://stackoverflow.com/a/6645506/496488). – eipi10 May 15 '17 at 21:11
  • I meant "bar" labels, not "box" labels in the comment above. – eipi10 May 15 '17 at 23:19
  • Hi @eipi10, I looked at both of those posts when trying to solve my problem, but in each case their plot has a y aesthetic, whereas mine does not and hence produces the error message. Since mine is a "part of whole" kind of chart there is nothing I can put on the y -- does this mean I just cannot use geom_text at all? – Sarah May 17 '17 at 12:59
  • Can you post a sample of your data? Paste into your question the output of `dput(data_sample)`. – eipi10 Jun 03 '17 at 05:50
  • @eipi10 added! Let me know if this helps. Thank you! – Sarah Jun 05 '17 at 14:50

1 Answers1

5

The linked examples have a y aesthetic, because the data are pre-summarized, rather than having ggplot do the counting internally. With your data, the analogous approach would be:

library(scales) 
library(tidyverse)

# Summarize data to get counts and percentages
ZIU %>% group_by(Group, Affinity) %>%
  tally %>%
  mutate(percent=n/sum(n)) %>%   # Pipe summarized data into ggplot
  ggplot(aes(x=Group, y=percent, fill=Affinity)) +
   geom_bar(stat="identity", width=1, color="black") +
   geom_text(aes(label=paste0(sprintf("%1.1f", percent*100),"%")), 
             position=position_stack(vjust=0.5), colour="white") +
   scale_y_continuous(labels=percent_format()) +
   scale_fill_manual("Affinity", values=c("High"="blue", "Het"="lightblue", "Low"="gray")) +
   labs(x="Group", y="Percent Genotype within Group") +
   ggtitle("Genotype Distribution", "by Group")

enter image description here

Another option would be to use a line plot, which might make the relative values more clear. Assuming the Group values don't form a natural sequence, the lines are just there as a guide for differentiating the Affinity values across different values of Group.

ZIU %>% group_by(Group, Affinity) %>%
  tally %>%
  mutate(percent=n/sum(n)) %>%   # Pipe summarized data into ggplot
  ggplot(aes(x=Group, y=percent, colour=Affinity, group=Affinity)) +
  geom_line(alpha=0.4) +
  geom_text(aes(label=paste0(sprintf("%1.1f", percent*100),"%")), show.legend=FALSE) +
  scale_y_continuous(labels=percent_format(), limits=c(0,1)) +
  labs(x="Group", y="Percent Genotype within Group") +
  ggtitle("Genotype Distribution", "by Group") +
  guides(colour=guide_legend(override.aes=list(alpha=1, size=1))) +
  theme_classic()

enter image description here

eipi10
  • 91,525
  • 24
  • 209
  • 285