4

I want to create a histogram with multiple data series on the same plot. The best method that I can find to do this is multhist(). I would like a plot in a style similar to hist(), and while ggplot() can also be used to perform this task, the graphics style is not what I want.

Here is some example data:

df <- structure(list(year = c(2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 
2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 
2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2012L, 
2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 
2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 
2012L, 2012L, 2012L), count = c(187L, 199L, 560L, 1000L, 850L, 
400L, 534L, 911L, 390L, 1008L, 1173L, 1222L, 810L, 950L, 752L, 
1125L, 468L, 710L, 290L, 670L, 855L, 614L, 1300L, 950L, 670L, 
888L, 490L, 557L, 741L, 700L, 954L, 378L, 512L, 780L, 951L, 398L, 
1544L, 903L, 769L, 1399L, 1021L, 1235L, 1009L, 1222L, 255L)), .Names = c("year", 
"count"), class = "data.frame", row.names = c(NA, -45L))

And here is the code that I have used so far:

require(plotrix)
d2011<-df$count[df$year=="2011"]
d2012<-df$count[df$year=="2012"]
year<-list(d2011,d2012)
mh <- multhist(year, xlab="Count", ylab="Frequency", main="", cex.axis=1, col=c("dark gray", "light gray"), breaks=seq(0,1600, by=200))
box(bty="l", col="black")
legend.text<-c("2011","2012")
legend(locator(1), legend=legend.text, col=c("dark gray", "light gray"), pch=15, bty="n", cex=0.8)

This provides me with a 'barplot style' multi histogram, but I am having issues changing two graph parameters.

  1. I would like the plot to look more like a histogram and less like a barplot, so firstly I want to remove (or reduce) the space between the columns. I have tried using space = NULL, but this command does not appear to work with multhist

  2. I would like to change the x-axis so that axis tick marks are present in between bars on the plot and axis text is aligned with tick marks rather than positioned at the bar midpoint. I have tried using axis(side=1, …), but as multhist uses list objects to create plots these commands don’t appear to work.

Any suggestions would be greatly appreciated. Suggestions for other useful graphics packages that can plot histograms with multiple datasets would also be welcomed.

Emily
  • 859
  • 5
  • 14
  • 31
  • Are you aware that you can change the "graphics style" (using `theme`s) in ggplot2? – Roland Jul 26 '13 at 11:03
  • @ Roland thanks for your comment. I have had trouble with themes in ggplot2 before, but maybe it is time to revisit them! Thanks. It would still be great to figure out how to adjust the plot in multhist if anyone knows an easy way to do this. – Emily Jul 26 '13 at 13:58

2 Answers2

5

Read the documentation of barplot to understand how to specify zero space:

multhist(year, xlab="Count", ylab="Frequency", main="", 
         cex.axis=1, col=c("dark gray", "light gray"), 
         breaks=seq(0,1600, by=200),
         space=c(0,0), beside=TRUE)

enter image description here

Here is an example with ggplot2 and theme_bw:

library(ggplot2)

ggplot(df, aes(x=count,group=year,fill=as.factor(year))) + 
  geom_histogram(position="identity", alpha=0.5, breaks=seq(0,1600, by=200),right=TRUE) +
  scale_fill_discrete(name="Year") +
  theme_bw(base_size=20) +
  xlab("values")

enter image description here

Or if you really want it like the plot from multhist (which is not as easy to interpret):

ggplot(df, aes(x=count,group=year,fill=as.factor(year))) + 
  geom_histogram(position="dodge", breaks=seq(0,1600, by=200),right=TRUE) +
  scale_fill_discrete(name="Year") +
  theme_bw(base_size=20) +
  xlab("values") +
  scale_x_continuous(breaks=seq(100,1500, by=200))

enter image description here

Roland
  • 127,288
  • 10
  • 191
  • 288
  • @ Roland thank you for your answer. ggplot2 seems like a good package for this type of histogram. I would also like to solve the issues with multhist, but maybe it is not possible with that plot type! – Emily Jul 26 '13 at 14:34
  • @Emily Added how to change the space in `multhist`. – Roland Jul 26 '13 at 14:47
  • Fantastic, thank you. Spent ages looking through barplot help and couldn't find it! – Emily Jul 26 '13 at 15:05
3

For superimposed histograms I prefer to use density plots. They're easier on the eyes, especially if you have thinner bins and more cases. With your data, one would get this.

ggplot(df, aes(x=count,group=year,fill=as.factor(year))) + 
  geom_density(position="identity", alpha=0.5, breaks=seq(0,1600, by=200),right=TRUE) +
  scale_fill_discrete(name="Year") +
  theme_bw() +
  xlab("values")

density plot

dmvianna
  • 15,088
  • 18
  • 77
  • 106
  • Thanks for this example. I find it really helpful! Wanted to ask what kind of units are used for "density" (y axis)? – pogibas Dec 08 '13 at 20:32
  • @Pgibas The curve is the result of transforming each data point into a normal distribution and then summing them all together [as explained here](http://en.wikipedia.org/wiki/Kernel_density_estimation). If you find a simple way of explaining its unit, I would like to know about it too.. XD – dmvianna Dec 09 '13 at 01:22