0

I have been trying to look for an answer to my particular problem but I have not been successful, so I have just made a MWE to post here.

I tried the answers here with no success.

The task I want to do seems easy enough, but I cannot figure it out, and the results I get are making me have some fundamental questions...

I just want to overlay points and error bars on a bar plot, using ggplot2.

I have a long format data frame that looks like the following:

> mydf <- data.frame(cell=paste0("cell", rep(1:3, each=12)),
   scientist=paste0("scientist", rep(rep(rep(1:2, each=3), 2), 3)),
   timepoint=paste0("time", rep(rep(1:2, each=6), 3)),
   rep=paste0("rep", rep(1:3, 12)),
   value=runif(36)*100)

I have attempted to get the plot I want the following way:

myPal <- brewer.pal(3, "Set2")[1:2]
myPal2 <- brewer.pal(3, "Set1")
outfile <- "test.pdf"
pdf(file=outfile, height=10, width=10)
print(#or ggsave()
  ggplot(mydf, aes(cell, value, fill=scientist )) +
  geom_bar(stat="identity", position=position_dodge(.9)) +
  geom_point(aes(cell, color=rep), position=position_dodge(.9), size=5) +
  facet_grid(timepoint~., scales="free_x", space="free_x") +
  scale_y_continuous("% of total cells") +
  scale_fill_manual(values=myPal) +
  scale_color_manual(values=myPal2)
)
dev.off()

But I obtain this:

example

The problem is, there should be 3 "rep" values per "scientist" bar, but the values are ordered by "rep" instead (they should be 1,2,3,1,2,3, instead of 1,1,2,2,3,3).

Besides, I would like to add error bars with geom_errorbar but I didn't manage to get a working example...

Furthermore, overlying actual value points to the bars, it is making me wonder what is actually being plotted here... if the values are taken properly for each bar, and why the max value (or so it seems) is plotted by default.

The way I think this should be properly plotted is with the median (or mean), adding the error bars like the whiskers in a boxplot (min and max value).

Any idea how to...

  • ... have the "rep" value points appear in proper order?
  • ... change the value shown by the bars from max to median?
  • ... add error bars with max and min values?
TobiO
  • 1,335
  • 1
  • 9
  • 24
DaniCee
  • 2,397
  • 6
  • 36
  • 59
  • 3
    " change the value shown by the bars from max to median" & "add error bars with max and min values" - just use boxplot – pogibas Dec 19 '19 at 11:55
  • I'm not sure, how you would want to show errorbars. on a per scientist per timepoint basis? Errors calculated according to what statistic? sd? ci? With three replicates I would stick to single points and mean/median – TobiO Dec 19 '19 at 12:47
  • Yeah I wanted to use boxplots, but the people who asked me for this want it that way, what can I say... I think I'll just leave the bars with the points and no error bars – DaniCee Dec 20 '19 at 02:16
  • What I would need is just to plot one bar per Scientist (as in my MWE), with the 3 different points that make it up overlaid on top – DaniCee Dec 20 '19 at 02:29

1 Answers1

2

I restructured your plotting code a little to make things easier. The secret is to use proper grouping (which is otherwise inferred from fill and color. Also since you're dodging on multiple levels, dodge2 has to be used.

When you are unsure about "what is plotted where" in bar/column charts, it's always helpful to add the option color="black" which reveals that still things are stacked on top each other, because of your use of dodge instead of dodge2.

p = ggplot(mydf, aes(x=cell, y=value, group=paste(scientist,rep))) +
  geom_col(aes(fill=scientist), position=position_dodge2(.9)) +
  geom_point(aes(cell, color=rep), position=position_dodge2(.9), size=5) +
  facet_grid(timepoint~., scales="free_x", space="free_x") +
  scale_y_continuous("% of total cells") +
  scale_fill_brewer(palette = "Set2")+
  scale_color_brewer(palette = "Set1")

ggsave(filename = outfile, plot=p, height = 10, width = 10)

gives: enter image description here

Regarding error bars

Since there are only three replicates I would show original data points and maybe a violin plot. For completeness sake I added also a geom_errorbar.

ggplot(mydf, aes(x=cell, y=value,group=paste(cell,scientist))) +
  geom_violin(aes(fill=scientist),position=position_dodge(),color="black") +
  geom_point(aes(cell, color=rep), position=position_dodge(0.9), size=5) +
  geom_errorbar(stat="summary",position=position_dodge())+
  facet_grid(timepoint~., scales="free_x", space="free_x") +
  scale_y_continuous("% of total cells") +
  scale_fill_brewer(palette = "Set2")+
  scale_color_brewer(palette = "Set1")

gives

enter image description here

Update after comment

As I mentioned in my comment below, the stacking of the percentages leads to an undesirable outcome.

ggplot(mydf, aes(x=paste(cell, scientist), y=value)) +
  geom_bar(aes(fill=rep),stat="identity", position=position_stack(),color="black") +
  geom_point(aes(color=rep), position=position_dodge(.9), size=3) +
  facet_grid(timepoint~., scales="free_x", space="free_x") +
  scale_y_continuous("% of total cells") +
  scale_fill_brewer(palette = "Set2")+
  scale_color_brewer(palette = "Set1")

enter image description here

halfer
  • 19,824
  • 17
  • 99
  • 186
TobiO
  • 1,335
  • 1
  • 9
  • 24
  • Thanks! I had already done the boxplots, but somewhat the people who requested it want it in bar plot... I think I will leave the errorbars out. your first solution is close, but I still want one bar per Scientist, not 3. So one bar with the 3 different points overlaid – DaniCee Dec 20 '19 at 02:20
  • So the client would like to have a stacked barchart? What you're seeing the plot in your question is actually the bars behind each other. If one stacks the bars the y-axis will become basically meaningless for the data. Of course the points won't match up. It might make sense to at least have the fill color by replicate and then the x-axis showing also the scientist. But in the end summing up percentages isn't all that meaningful. I really suggest to get back to the client and ask what they want to show, or update your question with an example of the plot you want to achieve. – TobiO Dec 22 '19 at 20:29
  • I added another plot to my answer to illustrate my above comment – TobiO Dec 22 '19 at 20:40
  • Yeah the last part is what I needed, but at the end I just showed them barplots with the means, and error bars with the standard errors, as detailed in the R Cookbook here -> http://www.cookbook-r.com/Graphs/Plotting_means_and_error_bars_(ggplot2)/ – DaniCee Dec 26 '19 at 02:31