1

I'm trying to take event data (A, B, C, and D - below) which occur over 4 locations (1, 2, 3, 4 - below). I want to plot them as a stacked bar that is filled in to show the contribution of each event (A,B,C,D) to that location AND I want to show the integer values of those contributions. I would like to see not only the individual values (which below sort of does) but I'd also like to see the total contribution - which I can't figure out how to do.

So there are two problems: 1: Printing not only the individual values of a stacked bar but also (or even, separately / only) print the total value at the top. 2: The text labels get printed at a y offset of their value, so they overwrite each other and don't line up within the bar. I'd prefer them someplace expected inside a sub-bar such as the middle or top.

a <- c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,1,1,1,2)
b <- c('A','B','C','D','A','A','B','C','B','B','C','C','C','D','D','A','A','B','C','D')
df <- data.frame(a, b)

I want to create a summary of this - so here's table()

table(df$a, df$b)

  A B C D
1 2 2 2 1
2 2 1 1 1
3 0 2 2 0
4 1 0 1 2

Now back to a data.frame for plotting with ggplot:

df2 <- data.frame(table(df$a, df$b))

Then plot it:

library(ggplot2)
ggplot(df2, aes(x=Var1, y=Freq, fill=Var2, label=Freq)) + 
  geom_bar(stat="identity") + 
  geom_text(stat="identity")

I would really appreciate help. Do I not need to wrangle my data frame through a table to summarize it and then back into a data frame? Can I get at the total height of the bar and print that label?

I feel like if I weren't using fill, I could get at the ..count.. value but stat="bin", but since I've gone to stat="identity" I can't seem to get at that summary value.

Thanks!

Ullapool
  • 25
  • 6
  • [This](http://stackoverflow.com/questions/6644997/showing-data-values-on-stacked-bar-chart-in-ggplot2) and [this](http://stackoverflow.com/questions/23832145/label-error-in-geom-bar) might help – user20650 Jun 10 '14 at 20:58

2 Answers2

2

I would summarize the data like you have in order to produce your desired plot. As for the labels, you need to also create variables that define where your labels should be placed on your graph.

a <- c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,1,1,1,2)
b <- c('A','B','C','D','A','A','B','C','B','B','C','C','C','D','D','A','A','B','C','D')
df <- data.frame(a, b)
df2 <- data.frame(table(df$a, df$b))

Now create a variable for the overall count:

df2$overall <- NA
df2$overall[1:length(unique(df2$Var1))] <- xtabs(Freq~Var1,data=df2)

Now create a variable for the counts of each bar using the ddply package:

library(plyr)
df2 <- ddply(df2, "Var1", transform, cumvars=cumsum(Freq))
# Remove Zeros from printing on labels
df2$Freq2 <- ifelse(df2$Freq==0,NA,df2$Freq)


library(ggplot2)

ggplot(df2, aes(x=Var1, y=Freq, fill=Var2, label=Freq)) + 
  geom_bar(stat="identity") + 
  geom_text(aes(x=Var1, y=overall, label=overall),vjust=-.2,stat="identity") + 
  geom_text(aes(x=Var1, y=cumvars, label=Freq2),vjust=1.5, colour="white", stat="identity")

You can change the size, colour, position, etc. of the labels to make the graph look nice.

Mark Nielsen
  • 991
  • 2
  • 10
  • 28
1

Okay, first let's get some reasonable names, because when your text is always talking about "events" and "locations", but your variable names are a and b, it's easy to be confused. Also, since your locations are categorical, we'll make sure they're coded as a factor.

a <- c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,1,1,1,2)
b <- c('A','B','C','D','A','A','B','C','B','B','C','C','C','D','D','A','A','B','C','D')
df <- data.frame(a, b)
names(df) <- c("location", "event")
df$location <- factor(df$location)

With that cleared up, ggplot will do all the summarizing for us, at least for the bar plot.

library(ggplot2)
ggplot(df, aes(x = location, fill = event)) + geom_bar()

I think we do need to summarize to get the totals:

library(dplyr)
totes <- df %.% group_by(location) %.% summarize(total = n())

ggplot(df, aes(x = location)) + geom_bar(aes(fill = event)) +
    geom_text(data = totes,
              mapping = aes(y = total + .2, label = total))

Getting individual sub-bar contributions inside the bars will be trickier, and I'll leave that as an exercise for the reader or for someone else to answer. I'd also encourage you to use something other than a stacked bar plot, which will allow comparisons for those numbers much more easily. Maybe something like this:

df.counts <- df %.% group_by(location, event) %.% summarize(n = n())

ggplot(totes, aes(x = location, y = total)) +
    geom_line(aes(group = 1), size = 1) +
    geom_line(data = df.counts, aes(y = n, color = event, group = event), size = 0.9,
              position = position_jitter(w = 0.05, h = 0.1)) +
    # jitter not pictured, but it helps with the overlapping lines
    expand_limits(y = 0) +
    annotate(geom = "text", x = 2, y = 6, label = "Total", size = 10)

enter image description here

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • Thank you. I will go and learn more about the dplyr package. Thanks! – Ullapool Jun 10 '14 at 21:03
  • I share your concern over how to visualize the data and the questionable value of stacked data. However, lines seem to me to imply some connectivity between the x-axis data points (or location, in this example). The slope between the points seems to speak to a change, a delta, which really doesn't exist in this example. Perhaps dodged bars would be better . . . hmm, thanks for the idea. – Ullapool Jun 10 '14 at 21:48
  • @Ullapool With dodged bars you could plot the total as a horizontal line segment at each location. – Gregor Thomas Jun 10 '14 at 22:56