5

I am trying to get ggplot to produce a histogram with bins which are 3 months wide. Not 90 days but 3 months. In terms of days, this is an unequal width binning. Note that tick marks at 3 month intervals works fine. It is the bin width that I am having problems with. There was quite a bit of discussion here but I could not find a resolution.

Understanding dates and plotting a histogram with ggplot2 in R

Here is a statement of the problem. Note that I could obviously aggregate the results outside of ggplot and then plot them, perhaps as factors in ggplot. But I was looking for an all ggplot solution.

set.seed(seed=1)
dts<-as.Date('2012-01-01') + round(365*rnorm(500))
dts<-data.frame(d=dts)
g<-ggplot(dts,aes(x=d, y=..count..))

#this isnt what I want.  It is 90 days, not 3 months.
#Setting binwidth=' 3 months' also doesnt work
g + geom_histogram(fill='blue',binwidth=90) +
    scale_x_date(breaks = date_breaks('3 months'),  #seq(as.Date('2008-1-1'), as.Date('2012-3-1'), '3 month'),
                 labels = date_format("%Y-%m"),
                 limits = c(as.Date('2010-1-1'), as.Date('2014-1-1'))) +
    opts(axis.text.x = theme_text(angle=90))

#this doesnt work either.
#get:   stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
#        Error in `+.Date`(left, right) : binary + is not defined for Date objects
g + geom_bar(fill='blue') +
    stat_bin(breaks=seq(as.Date('2010-1-1'), as.Date('2014-1-1'), '3 month')) +
    scale_x_date(breaks = date_breaks('3 months'),  #seq(as.Date('2008-1-1'), as.Date('2012-3-1'), '3 month'),
                 labels = date_format("%Y-%m"),
                 limits = c(as.Date('2010-1-1'), as.Date('2014-1-1'))) +
    opts(axis.text.x = theme_text(angle=90))

Perhaps the answer is: ggplot will not create 3 month wide (or N month wide) bins.

Community
  • 1
  • 1
Alan Berezin
  • 101
  • 1
  • 1
  • 4
  • 2
    I think you should do the aggregation outside `ggplot` ... the consensus from the experts (e.g. on the ggplot mailing list) seems to be that once things get sufficiently complicated, it's better to do the aggregation oneself and then feed the results into `ggplot` (with appropriate `geom`s, you can make the results look exactly the way `ggplot` *would* have plotted them if it were capable of doing unequal bin widths) rather than doing backflips to get it done within `ggplot` (which is after all primarily a *plotting* package). – Ben Bolker Sep 12 '12 at 20:22
  • Why are your limits so offset from your breaks??? – IRTFM Sep 12 '12 at 20:46
  • Thanks Ben for the guidance. DWin: I dont understand why the limits are not strictly enforced and why, thus, there is some overhang and what appears visually to be a half width bin on each end. Perhaps it is related to my limits and labels being 3 months but the binwidth being 90 days. – Alan Berezin Sep 13 '12 at 20:12

1 Answers1

3

As you noticed, stat_bin will allow the specification of bin edges. But when working with dates, it is often the case that the value must be transformed into the internal scale by hand to work. Also, in your second example, you have both a geom_bar and a stat_bin which is plotting two different layers. Here is a working version:

g + stat_bin(breaks=as.numeric(seq(as.Date('2010-1-1'), 
                                   as.Date('2014-1-1'), '3 month')),
             fill = "blue",
             position = "identity") +
    scale_x_date(breaks = date_breaks('3 months'),
                 labels = date_format("%Y-%m"),
                 limits = c(as.Date('2010-1-1'), as.Date('2014-1-1'))) +
    opts(axis.text.x = theme_text(angle=90))

enter image description here

Note that I've wrapped the breaks argument to stat_bin in as.numeric. Also, I added a position="identity" argument to stat_bin to eliminate the warning about unequal bin widths (since there is only one group, it does not need to be stacked with anything).

Brian Diggs
  • 57,757
  • 13
  • 166
  • 188