1

I've been browsing+searching for some time and cannot figure out if I did something wrong on this.

I have a simple data frame with some factors and numbers. There is a time variable, I'm using only the year, and I want to plot subsets of this data in different plots.

I've wrote a simple function to subset and plot the data, but I've noticed that when I have data that is inside of the range defined by scale_x_continuous the bars are not plotted but the values are. Really strange, please see the sample code below that reproduces the problem (I cannot call it an error yet).

First, the function to create a PNG with the plot. Its parameters are a data frame, a min and max year and optionally a name to be used as filter (if name is passed as parameter it will also be used for the PNG file name and plot title).

library(ggplot2)
library(plyr)


doPlot <- function(data,minYear,maxYear,name=character(0)) {
  # If we pass a name as parameter we need to change the output file name, the plot title
  # and subset the data.
  fname = sprintf("Performance-%d-%d.png",minYear,maxYear)
  title = "Performance (Sheared Sheeps)"
  mysubset <- subset(data, Category == "SHEEP")
  mysubset <- subset(mysubset, Year <= maxYear & Year >= minYear)
  if(length(name)!=0) {
    fname = sprintf("Performance-%d-%d-%s.png",minYear,maxYear,name)
    title = sprintf("Performance (Sheared Sheeps) - %s",name)
    mysubset <- subset(mysubset,Name == name)
  }
  byYear <- aggregate(Quantity ~ Year, mysubset, sum)
  png(filename=fname,width = 960, height = 640)
  g <- ggplot(byYear, aes(x=Year,y=Quantity))
  print(g + geom_bar(stat="identity",fill=goodBlue, colour="black") + 
    ggtitle(title) +
    scale_x_continuous(name="Year", limits=c(minYear,maxYear), breaks=seq(minYear,maxYear,1)) +
    mytheme+
    geom_text(aes(label=Quantity), vjust=1.3,size=6))
  dev.off()
}

Some constants (data frame and themes for the plot)

consts = paste('"Category","Year","Name","Quantity"\n',
               'SHEEP,2003,Alice,10\n',
               'SHEEP,2005,Alice,3\n',
               'SHEEP,2008,Alice,2\n',
               'SHEEP,2009,Alice,1\n',
               'SHEEP,2012,Alice,3\n',
               'CACTUS,1997,Bob,45\n',
               'CHICKEN,1997,Bob,6\n',
               'SHEEP,1998,Bob,2\n',
               'SHEEP,2005,Bob,5\n',sep = "")
data <- read.csv(text=consts,header = TRUE)

# Colors and themes for the plot
goodBlue <- "#7fbfff"
darkBlue <- "#3f5f7f"
mytheme <- theme(plot.title = element_text(color=darkBlue,face="bold",size=20),
                   axis.title.x = element_text(color=darkBlue,face="bold",size=16),
                   axis.title.y = element_text(color=darkBlue,face="bold",size=16),
                   axis.text.x = element_text(color=darkBlue,face="bold",size=14),
                   axis.text.y = element_text(color=darkBlue,face="bold",size=12),
                   legend.title = element_text(color=darkBlue,face="bold",size=18),
                   legend.text = element_text(color=darkBlue,face="bold",size=12))

The main code. I will create four plots, one considering data from 2000 to 2010, for all shearers and for Bob alone.

# Consider only this range.
minYear <- 2000
maxYear <- 2010
doPlot(data,minYear,maxYear)
doPlot(data,minYear,maxYear,"Bob")

This works OK. All sheared sheep, from 2000 to 2010 Sheared by Bob the Near-Sighted Shearer, 2000-2010

Now considering a different range for the years.

minYear <- 2005
maxYear <- 2009
doPlot(data,minYear,maxYear)
doPlot(data,minYear,maxYear,"Bob")

Here are the results: note that the bars on the left and right are not plotted, but the numbers are.

All sheared sheep, from 2005 to 2009 Sheared by Bob the Near-Sighted Shearer, 2005-2009

The data is obviously fake, but similar in structure to what I have. I want the numbers on the bars because eventually I will stack the bars for different categories. I also need to make sure that the plots are comparable, i.e. the cover the same X range for all plots, even if some subsets of the data have different ranges.

The code was working and I used it to write some reports, until I came into a subset for which this thing happened. The questions are:

  • If you try to reproduce this, does it happens in your setup too? I am using a Mac, 3.2.3, RStudio 0.99.879
  • Did I do anything wrong in regard to using geom_bar and/or scale_x_continuous?
  • Is it a graphics issue (it seems the left and right margins are not enough to plot the bar) or a concepts issue (extreme bars are not to be plotted when using limits?)
  • If the best option is to use geom_histogram how could I make sure the bins are exactly equal to one year, considering the range I've specified?

thanks

Rafael Santos
  • 463
  • 5
  • 16
  • A quick solution: changing "limits=c(minYear,maxYear)" to "limits=c(minYear-1,maxYear+1)" in the scale_x_continuous seems to work, but it looks like a kludge to me. – Rafael Santos Feb 23 '16 at 18:35
  • 3
    why are you using a continuous x axis for discrete/categorical data? – hrbrmstr Feb 23 '16 at 18:37
  • @hrbrmstr -- can't I consider a range of years (integers) as continuous? I understand that they are not strictly categorical, e.g. if I have data for 2009, 2010 and 2012 I want the plot to show that the bar height would be 0 for 2011. I assumed that if I considered the data in the X axis to be discrete I would have three bars for the three years I have data on. – Rafael Santos Feb 23 '16 at 18:44
  • one option is to make them factors and just have ggplot not drop missing factors. the other option is to make them dates and use `scale_x_date` and break+format the axis labels to only show the year. – hrbrmstr Feb 23 '16 at 18:49
  • There's a hint on "have ggplot not drop missing factors" here: http://stackoverflow.com/questions/9818835/geom-boxplot-from-ggplot2-forcing-an-empty-level-to-appear -- will check and get back to this post. – Rafael Santos Feb 23 '16 at 19:18
  • `?discrete_scale` (search for `drop`). That maps to `scale_x_discrete`. – hrbrmstr Feb 23 '16 at 19:19
  • Convert years to factors would require other changes (e.g. filter by range of years) and it seems counterintuitive for me. I could consider changing the x axis to date but is there any relation with the type of the X axis variable and the missing bars in the plot? – Rafael Santos Feb 23 '16 at 19:45
  • you should really reduce that code and text to a minimum. – erc Feb 23 '16 at 21:39
  • @beetroot: sorry, I know it is long but I prefer detailed, self-contained, full code so others can reproduce it. – Rafael Santos Feb 23 '16 at 21:43
  • I am sure the problem would be more easily reproducible with less code. – erc Feb 24 '16 at 06:52
  • I agree with @beetroot and would add that when coming up with a minimal reproducible example the answer often presents itself. – Mist Feb 25 '16 at 03:54
  • I've added a much simpler question which reproduces the problem here: http://stackoverflow.com/questions/35637012/revisiting-rggplotgeom-barscale-x-continuouslimits-leftmost-and-rightmost-b – Rafael Santos Feb 25 '16 at 19:54

0 Answers0