19

I'll use the diamond data set in ggplot to illustrate my point , I want to draw a histogram for price , but I want to show the count for each bin for each cut this is my code

ggplot(aes(x = price ) , data = diamonds_df) + 
geom_histogram(aes(fill = cut , binwidth = 1500)) +
stat_bin(binwidth= 1500, geom="text", aes(label=..count..) , 
vjust = -1) + 
scale_x_continuous(breaks = seq(0 , max(stores_1_5$Weekly_Sales) , 1500 ) 
, labels = comma)

here is my current plot

enter image description here

but as you see the number shows the count for all cuts at each bin , I want to display the count for each cut on each bin .

also a bonus point if if I would be able to configure Y axis instead of displaying numbers at step of 5000 to something else I can configure manually

Nader Hisham
  • 5,214
  • 4
  • 19
  • 35

2 Answers2

25

Update for ggplot2 2.x

You can now center labels within stacked bars without pre-summarizing the data using position=position_stack(vjust=0.5). For example:

ggplot(aes(x = price ) , data = diamonds) + 
  geom_histogram(aes(fill=cut), binwidth=1500, colour="grey20", lwd=0.2) +
  stat_bin(binwidth=1500, geom="text", colour="white", size=3.5,
           aes(label=..count.., group=cut), position=position_stack(vjust=0.5)) +
  scale_x_continuous(breaks=seq(0,max(diamonds$price), 1500))

Original Answer

You can get the counts for each value of cut by adding cut as a group aesthetic to stat_bin. I also moved binwidth outside of aes, which was causing binwidth to be ignored in your original code:

ggplot(aes(x = price ), data = diamonds) + 
  geom_histogram(aes(fill = cut ), binwidth=1500, colour="grey20", lwd=0.2) +
  stat_bin(binwidth=1500, geom="text", colour="white", size=3.5,
           aes(label=..count.., group=cut, y=0.8*(..count..))) +
  scale_x_continuous(breaks=seq(0,max(diamonds$price), 1500))

enter image description here

One issue with the code above is that I'd like the labels to be vertically centered within each bar section, but I'm not sure how to do that within stat_bin, or if it's even possible. Multiplying by 0.8 (or whatever) moves each label by a different relative amount. So, to get the labels centered, I created a separate data frame for the labels in the code below:

# Create text labels
dat = diamonds %>% 
  group_by(cut, 
           price=cut(price, seq(0,max(diamonds$price)+1500,1500),
                     labels=seq(0,max(diamonds$price),1500), right=FALSE)) %>%
  summarise(count=n()) %>%
  group_by(price) %>%
  mutate(ypos = cumsum(count) - 0.5*count) %>%
  ungroup() %>%
  mutate(price = as.numeric(as.character(price)) + 750)

ggplot(aes(x = price ) , data = diamonds) + 
  geom_histogram(aes(fill = cut ), binwidth=1500, colour="grey20", lwd=0.2) +
  geom_text(data=dat, aes(label=count, y=ypos), colour="white", size=3.5)

enter image description here

To configure the breaks on the y axis, just add scale_y_continuous(breaks=seq(0,20000,2000)) or whatever breaks you'd like.

eipi10
  • 91,525
  • 24
  • 209
  • 285
  • 1
    of note, latest ggplot2 syntax (March 2023) would rather use `y = after_stat(count)` instead of `y = ..count..` – tjebo Mar 03 '23 at 13:04
-1

Now with GGPLOT 2.2.0 position_stack options makes it easier

library(ggplot2)
s <- ggplot(mpg, aes(manufacturer, fill = class))
s + geom_bar(position = "stack") +  
    theme(axis.text.x = element_text(angle=90, vjust=1)) + 
    geom_text(stat='count', aes(label=..count..), position = position_stack(vjust = 0.5),size=4)
Sant
  • 11