2

I want to make a simple barplot, I have one variable x (A,B,C,D) categorical, another one y (YES, NO) that I am using to fill, and a set of observations, and I want to display a filled barplot, with percentage labels in each column.

Something as simple as like this:

Proper Filled Baplot with %

So far ggplot layer system has been a nightmare to use. And no solutions I was able to find in already asked questions.

x11()
ggplot(data=KS, aes(x=KS$main_category, fill=KS$state)) +
    geom_bar(position="fill") +
    scale_y_continuous(labels = percent) +
    geom_text(aes(label = ..count.., group = KS$state), 
              stat = "count")

This is what I got so far and a part for positioning it displays the count for every category and state, why can't it display proportions?. And I want to avoid to manipulate the data and adding stuff to the dataframe.

Thanks a lot.

Edit: the requested data frame

library("ggplot2")
library("scales")

main_category=c('A','A','B','C','D','A','A','B','C','D','A','A','B','C','D','A','A','B','C','D')
state=c('Yes', 'No', 'Yes', 'Yes','Yes', 'No', 'Yes', 'Yes','Yes', 'No', 'Yes', 'Yes','Yes', 'No', 'Yes', 'Yes', 'No', 'No', 'No', 'No')
KS = data.frame(main_category, state)

Edit 2:

I was able to find my own solution without manipulating the dataset by using implicit ggplot variables:

geom_text(aes( y=..count../tapply(..count.., ..x.. ,sum)[..x..], label=percent(..count../tapply(..count.., ..x.. ,sum)[..x..]) ),
              stat="count", position=position_fill(0.5), vjust=0.5)
lucmobz
  • 453
  • 3
  • 8
  • 1
    Surely this is at least very similar to questions that have been asked before. You should show what you found searching and explain what is missing in the prior answers. – IRTFM Apr 14 '19 at 22:07
  • Anything that has been asked doesn't refer to counting the observations, but applying percentages based on some preprocessed y axis quantity, which I don't have. https://stackoverflow.com/questions/44724580/in-rs-ggplot2-how-to-add-percentage-labels-to-a-stacked-barplot-with-percenta – lucmobz Apr 14 '19 at 22:14
  • 1
    [Please provide](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) some or all of the data in `KS`. I agree with @42-, many variants of this question have been asked before and I'm sure some combination of existing questions & answers will answer it. – neilfws Apr 14 '19 at 22:27
  • I have added the KS dataframe. – lucmobz Apr 14 '19 at 22:35
  • I'm intentionally refusing to accept the notion that ggplot activity should be constrained by a request to "avoid manipulating the data", since that is an explicit expectation for users of the ggplot system. – IRTFM Apr 15 '19 at 00:58

1 Answers1

1

Given your data calculate the precentage first then calculate the respective y-value and plot it as described in the post you linked in the comment:

library("ggplot2")
library("scales")
library(dplyr)

main_category=c('A','A','B','C','D','A','A','B','C','D','A','A','B','C','D','A','A','B','C','D')
state=c('Yes', 'No', 'Yes', 'Yes','Yes', 'No', 'Yes', 'Yes','Yes', 'No', 'Yes', 'Yes','Yes', 'No', 'Yes', 'Yes', 'No', 'No', 'No', 'No')
KS = data.frame(main_category, state)

cnt <- KS %>% group_by(main_category, state) %>% summarise(n=n())
pcnt <- do.call(rbind,
  lapply(split(cnt, cnt$main_category), function(x){x[x$state=='Yes', 'n']/sum(x$n)})
  )
names(pcnt) <- 'pcnt'
pcnt$main_category <- rownames(pcnt)
pcnt$state='Yes'
pcnt2 <- do.call(rbind,
                lapply(split(cnt, cnt$main_category), function(x){x[x$state=='No', 'n']/sum(x$n)})
)
names(pcnt2) <- 'pcnt'
pcnt2$main_category <- rownames(pcnt2)
pcnt2$state='No'
KS <- merge(KS, rbind(pcnt, pcnt2))

KS$labelpos <- ifelse(KS$state=='Yes',
                      KS$pcnt/2, 1 - KS$pcnt/2)


gg <- ggplot(data=KS, aes(x=main_category, fill=state)) 
gg <- gg + geom_bar(position="fill")
gg <- gg + geom_text(aes(label = paste0(100*pcnt,"%"),y=labelpos),size = 3)
gg <- gg + scale_y_continuous(labels = scales::percent)
print(gg)

enter image description here

Simon
  • 577
  • 3
  • 9
  • Thanks, so there's no other way than modifying the dataframe? I hoped some internal variables like ..count.. could be used – lucmobz Apr 15 '19 at 08:02
  • Hi Simon, where on the plot does labelpos show up? I guess my question is also what's the logic behind labelpos variable. Thanks! – merry123 Oct 28 '22 at 18:15
  • I see what the labelpos variable is now after thinking about it. Is there a way to not calculate the y aesthetics and use position = position_fill(vjust = 0.1) as an alternative? – merry123 Oct 29 '22 at 13:27