35

doing facets in ggplot I would often like the percentage to be used instead of counts.

e.g.

test1 <- sample(letters[1:2], 100, replace=T)
test2 <- sample(letters[3:8], 100, replace=T)
test <- data.frame(cbind(test1,test2))
ggplot(test, aes(test2))+geom_bar()+facet_grid(~test1)

This is very easy but if N is different in facet A compared to facet B, it would be better I think, to compare percentages, in such a way that the each facet sums to 100%.

how would you achieve this?

Hope my question makes sense.

Sincerely.

Andreas
  • 6,612
  • 14
  • 59
  • 69

6 Answers6

54

Here is a within ggplot method, using ..count.. and ..PANEL..:

ggplot(test, aes(test2)) + 
    geom_bar(aes(y = (..count..)/tapply(..count..,..PANEL..,sum)[..PANEL..])) + 
    facet_grid(~test1)

As this is computed on the fly, it should be robust to changes to plot parameters.

James
  • 65,548
  • 14
  • 155
  • 193
  • This is a great approach. Do you think it's posible to add percentage labels to each bar that add to 100% in each facet? – marbel Dec 16 '13 at 00:32
  • @MartínBel It seems that `geom_text` doesn't work with computed variables. You might want to post as a separate question. – James Dec 16 '13 at 16:02
  • 2
    Sure. Here is the [question](http://stackoverflow.com/questions/20600900/r-faceted-bar-chart-with-percentages-labels-independent-for-each-plot) I leave it here for future reference. – marbel Dec 16 '13 at 22:19
  • so very helpful. Thanks a lot – hyunwoo jeong Apr 29 '16 at 01:54
  • If using stat='identity', then `..y../tapply(..y.., ..PANEL.., sum)[..PANEL..]` works. – krassowski Feb 23 '20 at 12:59
21

Try this:

# first make a dataframe with frequencies
df <- as.data.frame(with(test, table(test1,test2)))
# or with count() from plyr package as Hadley suggested
df <- count(test, vars=c('test1', 'test2'))
# next: compute percentages per group
df <- ddply(df, .(test1), transform, p = Freq/sum(Freq))
# and plot
ggplot(df, aes(test2, p))+geom_bar()+facet_grid(~test1)

alt text

You could also add + scale_y_continuous(formatter = "percent") to the plot for ggplot2 version 0.8.9, or + scale_y_continuous(labels = percent_format()) for version 0.9.0.

joran
  • 169,992
  • 32
  • 429
  • 468
daroczig
  • 28,004
  • 7
  • 90
  • 124
  • This is a much better solution. +1 – Chase Jan 18 '11 at 15:26
  • @Chase and @Andreas: thank you! I have just posted a simpler (and I think: nicer) method based on this question: http://stackoverflow.com/q/3695497/564164 – daroczig Jan 18 '11 at 15:58
  • 2
    Try using `count` instead of `as.data.frame(table(...))` - it's much faster and doesn't turn all tabulating variables into factors. – hadley Jan 18 '11 at 18:25
  • @hadley: thank you for pointing my attention to this useful function. I made a note of it for the future! – daroczig Jan 18 '11 at 18:30
  • 1
    For ggplot2 v.1.0.1, that last part needs to be: `+ scale_y_continuous(labels = percent)` – Owen Aug 06 '15 at 20:07
7

A very simple way:

ggplot(test, aes(test2)) + 
    geom_bar(aes(y = (..count..)/sum(..count..))) + 
    facet_grid(~test1)

So I only changed the parameter of geom_bar to aes(y = (..count..)/sum(..count..)). After setting ylab to NULL and specifying the formatter, you could get:

ggplot(test, aes(test2)) +
    geom_bar(aes(y = (..count..)/sum(..count..))) + 
    facet_grid(~test1) +
    scale_y_continuous('', formatter="percent")

Update Note that while formatter = "percent") works for ggplot2 version 0.8.9, in 0.9.0 you'd want something like scale_y_continuous(labels = percent_format()). alt text

joran
  • 169,992
  • 32
  • 429
  • 468
daroczig
  • 28,004
  • 7
  • 90
  • 124
  • ghezus - I actually think somebody on SO answered this for me before. Embarrising for me - hopefully this will come up in search's from now on. Thanks again. – Andreas Jan 19 '11 at 08:37
  • 4
    actually is scale_y_continuous(labels = percent) (using the scales package) – dickoa Mar 24 '12 at 21:30
  • can someone help me understand the meaning of .. in the above statement (..count..)/sum(..count..) ?? – Abhi Jul 20 '12 at 21:10
  • 1
    @Abhi - it's an internal ggplot2 function: "To use these variables in an aesthetic mapping, you need to surrond them with .., like aes(x = ..output..). This tells ggplot that the variable isn't the original dataset, but has been created by the statistic." see: http://had.co.nz/ggplot2/stat_sum.html – Andreas Sep 02 '12 at 11:01
  • This will make the percentages add to 100 for the whole plot , not for each facet. Maybe something new in latests version of ggplot2 – Andreas Sep 02 '12 at 11:08
  • 6
    In version 0.9.3 of ggplot2 this does not work. Instead of adding up each facet to 100%, this adds *all* facets to 100%. – Sim Dec 15 '12 at 06:31
  • 2
    My previous comment is incorrect. I am seeing a very strange behavior: starting from a fresh R session, this works. Starting within a project that loads a whole bunch of libraries, this does not work. Instead of adding up each facet to 100%, this adds *all* facets to 100%. Looks like a bug in ggplot2--something is getting confused. – Sim Dec 15 '12 at 06:51
1

Thank you for sharing the PANEL "tip" on the ggplot method.

For information: you can produce percentages in y lab, on the same bar chart, by using count and group in the ggplot method:

ggplot(test, aes(test2,fill=test1))
   + geom_bar(aes(y = (..count..)/tapply(..count..,..group..,sum)[..group..]), position="dodge")
   + scale_y_continuous(labels = percent)
trincot
  • 317,000
  • 35
  • 244
  • 286
Lilly
  • 65
  • 10
  • Whilst others may feel that this would perhaps be better suited as a comment and not an answer, I gave this answer a +1 because it helped me solve a problem I was having and wanted to thank Lilly for posting this. – paleo13 Jul 05 '17 at 00:41
1

Here's a solution that should get you moving in the right direction. I'm curious to see if there are more efficient ways to go about doing this as this seems a bit hacky and convoluted. We can use the built in ..density.. argument for the y aesthetic, but factors don't work there. So we also need to use scale_x_discrete to appropriately label the axis once we converted test2 into a numeric object.

ggplot(data = test, aes(x = as.numeric(test2)))+ 
geom_bar(aes(y = ..density..), binwidth = .5)+ 
scale_x_discrete(limits = sort(unique(test$test2))) + 
facet_grid(~test1) + xlab("Test 2") + ylab("Density") 

But give this a whirl and let me know what you think.

Also, you can shorten your test data creation like so, which avoids the extra objects in your environment and having to cbind them together:

test <- data.frame(
    test1 = sample(letters[1:2], 100, replace = TRUE), 
    test2 = sample(letters[3:8], 100, replace = TRUE)
)
Chase
  • 67,710
  • 18
  • 144
  • 161
  • 1
    yeah a bit convoluted - but still thanks - better then what I had :-) I don't know if this should be a feature of ggplot. I could imagine many situations where it would be better then plottng counts. On the other hand - it might be best to keep data mungering af graphing seperate :-) – Andreas Jan 18 '11 at 15:49
  • Density is not the same as percentage. – russellpierce Jul 15 '14 at 18:58
0

I deal with similar situations quite frequently, but take a very different approach that uses two of Hadley's other packages, namely reshape and plyr. Primarily because I have a preference for looking at things as 100% stacked bars (when they total to 100%).

test <- data.frame(sample(letters[1:2], 100, replace=T), sample(letters[3:8], 100, replace=T))
colnames(test) <- c("variable","value")
test <- cast(test, variable + value ~ .) 
colnames(test)[3] <- "frequ"

test <- ddply(test,"variable", function(x) {
    x <- x[order(x$value),]
    x$cfreq <- cumsum(x$frequ)/sum(x$frequ)
    x$pos <- (c(0,x$cfreq[-nrow(x)])+x$cfreq)/2
    x$freq <- (x$frequ)/sum(x$frequ)
    x
})

plot.tmp <- ggplot(test, aes(variable,frequ, fill=value)) + geom_bar(stat="identity", position="fill") + coord_flip() + scale_y_continuous("", formatter="percent")
Brandon Bertelsen
  • 43,807
  • 34
  • 160
  • 255