3

I want to plot a (facetted) stacked barplot where the X-Axis is in percent. Also the Frequency labels are displayed within the bars.

After quite some work and viewing many different questions on stackoverflow, I found a solution on how to solve this with ggplot2. However, I don't do it directly with ggplot2, I manually aggregate my data with a table call. And I do this manual aggregation in a complicated way and also calculate the percent values manually with temp variables (see source code comment "manually aggregate data").

How can I do the same plot, but in a nicer way without the manual and complicated data aggregation?

library(ggplot2)
library(scales)

library(gridExtra)
library(plyr)

##
##  Random Data
##
fact1 <- factor(floor(runif(1000, 1,6)),
                      labels = c("A","B", "C", "D", "E"))

fact2 <- factor(floor(runif(1000, 1,6)),
                      labels = c("g1","g2", "g3", "g4", "g5"))

##
##  STACKED BAR PLOT that scales x-axis to 100%
##

## manually aggregate data
##
mytable <- as.data.frame(table(fact1, fact2))

colnames(mytable) <- c("caseStudyID", "Group", "Freq")

mytable$total <- sapply(mytable$caseStudyID,
                        function(caseID) sum(subset(mytable, caseStudyID == caseID)$Freq))

mytable$percent <- round((mytable$Freq/mytable$total)*100,2)

mytable2 <- ddply(mytable, .(caseStudyID), transform, pos = cumsum(percent) - 0.5*percent)


## all case studies in one plot (SCALED TO 100%)

p1 <- ggplot(mytable2, aes(x=caseStudyID, y=percent, fill=Group)) +
    geom_bar(stat="identity") +
    theme(legend.key.size = unit(0.4, "cm")) +
    theme(axis.text.x = element_text(angle = 60, hjust = 1)) +
    geom_text(aes(label = sapply(Freq, function(x) ifelse(x>0, x, NA)), y = pos), size = 3) # the ifelse guards against printing labels with "0" within a bar


print(p1)

.. enter image description here

Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
mrsteve
  • 4,082
  • 1
  • 26
  • 63
  • PS: it might take me up to 14 hours to answer to comments or react in any way, as I am in a different time zone compare to most SO users. – mrsteve Aug 27 '14 at 04:08

2 Answers2

4

After you make the data:

fact1 <- factor(floor(runif(1000, 1,6)),
                  labels = c("A","B", "C", "D", "E"))

fact2 <- factor(floor(runif(1000, 1,6)),
                  labels = c("g1","g2", "g3", "g4", "g5"))

dat = data.frame(caseStudyID=fact1, Group=fact2)

You can automate making an unlabeled graph of the kind that you want with position_fill:

ggplot(dat, aes(caseStudyID, fill=Group)) + geom_bar(position="fill")

unlabeled graph

I don't know if there's a way to generate the text labels automatically. The positions and counts from the stacked graph are accessible with ggplot_build, if you want to use what ggplot calculates instead of doing it separately.

p = ggplot(dat, aes(caseStudyID, fill=Group)) + geom_bar(position="fill")
ggplot_build(p)$data[[1]]

That will return a dataframe with (among other things), count, x, y, ymin, and ymax variables that can be used to create positioned labels.

If you want the labels vertically centered in each category, first make a column with values halfway between ymin and ymax.

freq = ggplot_build(p)$data[[1]]
freq$y_pos = (freq$ymin + freq$ymax) / 2

Then add the labels to the graph with annotate.

p + annotate(x=freq$x, y=freq$y_pos, label=freq$count, geom="text", size=3)

labeled

user2034412
  • 4,102
  • 2
  • 23
  • 22
1

If you have the distribution of case study ID's in each group as single vector, you could use the sjp.stackfrq function from the sjPlot-package.

A <- floor(runif(1000, 1,6))
B <- floor(runif(1000, 1,6))
C <- floor(runif(1000, 1,6))
D <- floor(runif(1000, 1,6))
E <- floor(runif(1000, 1,6))

mydf <- data.frame(A,B,C,D,E)
sjp.stackfrq(mydf, legendLabels = c("g1","g2", "g3", "g4", "g5"))

enter image description here

The function offers many parameters to easily customize plot appearance (labelling, size and colors etc.).

Daniel
  • 7,252
  • 6
  • 26
  • 38