1

I am finding it incredibly difficult to add labels to bar plots using ggplot2. I am working with the titanic dataset, and am having to create additional data frames just for the sake of adding labels - and this whole thing is arduous and driving me crazy.

This is what the basic code and chart looks like:

titanic %>% ggplot(aes(x=Sex, fill=Survived))+
  geom_bar() + 
  scale_fill_discrete(name=NULL,labels=c("Dead", "Survived")) +
  labs(y="Number of Passengers", title="Titanic Survival Rates by Sex")

enter image description here

As you can see, there are no labels on the bars. Because there is no "y" variable in the aesthetic mappings, the geom_text(aes(label= xxx)) layer does not work. Also, without a "y" variable, geom_bar(stat="identity") doesn't work. This is what I did to get around the problem:

# Create a data frame from a two-way table including Survived and Sex

>table(titanic$Survived,titanic$Sex)
   female male
  0     81  468
  1    233  109

rates_by_sex<-data.frame(Sex=c("Female","Male"), Dead=c(81,468), Survived=c(233,109))

# Convert data frame to long format

>rates_by_sex_long <- melt(rates_by_sex, id="Sex") 

    Sex     variable   value
1   Female  Dead       81
2   Male    Dead       468
3   Female  Survived   233
4   Male    Survived   109

ggplot2 can now make use of geom_text() and aes(label=value)

rates_by_sex_long %>% ggplot(aes(x=Sex, y=value, fill=variable)) +
  geom_bar(stat="identity") +
  geom_text(aes(label=value), position = position_stack(vjust=0.5),colour = "white", size = 5) +
  scale_fill_discrete(name=NULL) +
  labs(y="Number of Passengers",title="Titanic Survival Rates by Sex")

Now this gives me the following chart with labels:
enter image description here

Here is another one I did using the same arduous method just to show the percentages:

# Manually create a data frame with the rate of survival.

table(titanic$Survived) # Gives raw counts of each category
100*round(prop.table(table(titanic$Survived)),4) # Survival rate in percentages

titanic_survival_rate<-data.frame(Survived=c("Yes","No"),Number=c(342,549), Percent=c(38.38,61.62))

titanic_survival_rate %>% ggplot(aes(x=Survived, y=Number)) + 
  geom_bar(stat="identity",fill="steelblue", colour = "black") +
  geom_text(aes(label=paste0(Percent,"%")),nudge_y=25,colour = "black", size = 4) +
  labs(y="Number of Passengers",title="Titanic Survival Rate")

enter image description here

Doing it this way is highly inefficient. There are so many charts to be made, and constructing data frames for each of them separately is going to be impractical and impossible. I don't even know what I will do when faceting.

Question: How can I get the labels (counts and percentages) for barplots with a categorical variable? I know it can be done with some additional coding (i.e., adding something to geom_text()) but I can't quite figure it out.

Please feel free to use this reproduceable code:

df<-data.frame(survived=c(1,1,0,0,0,1,0,1,1,0,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0),sex=c("M","F","M","M","M","F","F","F","M","M","M","F","F","F","M","F","M","F","F","M","M","M","M","M","M","M","M"))

df$survived<-as.factor(df$survived)

df %>% ggplot(aes(x=sex, fill=survived))+geom_bar()+geom_text(aes(label=???))

Pineapple
  • 193
  • 8
  • Does this answer your question https://stackoverflow.com/questions/63653351/how-to-use-stat-count-to-label-a-bar-chart-with-counts-or-percentages-in-ggplo/63656093#63656093 – stefan Jul 03 '21 at 14:20
  • 1
    @stefan Thanks for the link. So I guess I got the ..count.. to work even though I don't understand what's going on with it and stat="count". And I absolutely don't understand the code for getting percentages. I want to be able to understand the code and not just copy it ... – Pineapple Jul 03 '21 at 14:38
  • 1
    Hm. Each `geom` uses a `stat`. For most geoms this is `stat=identity`which means to use the data as is, which means we have to provide all aesthetics like x and y. geom_bar is different as it makes use of `stat=count` which means that under the hood ggplot computes the count of the x variable which are then mapped on y. That's why we don't need a y aes for geom_bar. Additionally the computed variables are accessible via e.g. `..count..` – stefan Jul 03 '21 at 14:59
  • ... However, the same logic could be applied to other geoms as well. e.g. when setting `stat=count` in geom_text we tell ggplot2 to compute the counts of the. x variable and use the computed counts for the y aesthetic, i.e. to place the labels. However, if we want to use the counts as labels as well we have to tell ggplot2 to do so by mapping `..count..` on label. – stefan Jul 03 '21 at 15:01

1 Answers1

2

You can prepare the data along with labels before plotting.

library(dplyr)
library(ggplot2)


df %>%
  count(sex, survived) %>%
  ggplot(aes(sex, n, fill = survived)) + 
  geom_col() + 
  geom_text(aes(label = n), 
            position = position_stack(vjust=0.5),colour = "white", size = 5)

enter image description here

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213