0

I've been trying to create a proportional stacked bar graph using ggplot and a huge data set that is one column of a dummy variable and one column a factor variable with 14 different levels. I posted a small sample of the data here.

Despite not having a clear y-variale in my data, I can produce a plot that is only really useful looking at the factors that have a lot of observations, but when there's only one or two, you can't see the proportion at all. The code I used is here.

ggplot(data,aes(factor(data$factor),fill=data$dummy))+
geom_bar()

ggplot says you need to apply a ddply function to the data frame.

ce<-ddply(data,"factor",transform, percent_y=y/sum(y)*100)

Their example doesn't really apply in the case of this data since there's no clear y-variable to call in the plot; just counts of each factor that is 1 or 0.

My best guess for a ddply function spits out an error about differeing number of rows.

ce<-ddply(plot,"factor(data$factor)",transform,
percent=sum(data$dummy)*100/(dim(data$dummy)[1]))
slap-a-da-bias
  • 376
  • 1
  • 6
  • 25
  • 2
    would this help? `geom_bar(position="fill")` generates a plot like this http://docs.ggplot2.org/current/position_fill-2.png – bjoseph Jul 01 '15 at 15:56
  • @bjoseph: Dang. Nice work! That's exactly what was missing. Thank you so much! – slap-a-da-bias Jul 01 '15 at 16:28
  • A follow-up, how would you also put the number of observations of each factor level inside or above each bar? – slap-a-da-bias Jul 01 '15 at 17:42
  • this answer should help get you started http://stackoverflow.com/questions/10112587/re-alignment-of-numbers-on-the-individual-bars-with-ggplot2 – bjoseph Jul 01 '15 at 18:09
  • Tried making a string of values to print on the graph{Labels<-c("n=1609","n=850","n=594", "n=567", "n=280","n=200","n=199","n=193", "n=106", "n=41", "n=23" , "n=8" , "n=5", "n=2") ggplot(data,aes(factor(data$factor),fill=data$dummy))+ geom_bar(position="fill")+ geom_text(aes(label=paste(Labels)))} But this results in a blank plot and it seems really inefficient. It'd be nice to know a method that will produce the "n"s on a plot with a lot more factor levels so as to avoid tedious coding. I really appreciate the help on this – slap-a-da-bias Jul 01 '15 at 19:31

0 Answers0