I'm attempting to use ggplot and R for analysing some epidemiologic data, and I'm continuing to struggle with getting an epidemic curve to appear properly.
Data is here
attach(epicurve)
head(epicurve)
onset age
1 21/12/2012 18
2 14/06/2013 8
3 10/06/2013 64
4 28/05/2013 79
5 14/04/2013 56
6 9/04/2013 66
epicurve$onset <- as.Date(epicurve$onset, format="%d/%m/%Y")
ggplot(epicurve, aes(onset)) + geom_histogram() + scale_x_date(breaks=date_breaks("1 year"), minor_breaks=date_breaks("1 month"), labels = date_format("%b-%Y"))
gives this graph. This is fine, but the binwidths are not related to any time period of note, and adjusting them is a bit trial and error.
For this particular dataset, I'd like to display the cases by month of onset.
One way I worked out how to do this is:
epicurve$monyr <- format(epicurve$onset, "%b-%Y")
epicurve$monyr <- as.factor(epicurve$monyr)
ggplot(epicurve, aes(monyr)) + geom_histogram()
Outputs a graph I can't post because of the reputation system. The bars represent something meaningful, but the axis labels are a bomb-site. I can't format the axes using scale_x_date
because they aren't dates and I can't work out what arguments to pass to scale_x_discrete
to give useful labels.
I have a feeling there should be an easier way to do this by doing an operation on the onset column. Can anyone give me any pointers, please?