1

This is somehow related to R+ggplot+geom_bar+scale_x_continuous+limits: leftmost and rightmost bars not showing on plot but I decided to do a different post because of the different question.

I need to create two different versions of plots from a dataset: one containing all values for a specific column and other with filtered values. I guess it is easier to see that considering my data frame:

consts = paste('"Category","Year","Name","Quantity"\n', 
               'SHEEP,2003,Alice,10\n',
               'SHEEP,2005,Alice,3\n',
               'SHEEP,2008,Alice,2\n',
               'SHEEP,2009,Alice,1\n',
               'SHEEP,2012,Alice,3\n',
               'CACTUS,1997,Bob,45\n',
               'CHICKEN,1997,Bob,6\n',
               'SHEEP,1998,Bob,2\n',
               'SHEEP,2005,Bob,5\n',sep = "")
data <- read.csv(text=consts,header = TRUE)

Suppose I need to see how many animals (and vegetables :-) were sheared by year. To reuse code I wrote a function that receives as parameters the data set and an optional name, and plots the data as a stacked bar with Year as the X axis, Quantity as the Y axis and using Category to create the different parts of the stack. The function is:

# Plot either all data or select by name.
doPlot <- function(data,name=character(0)) {
  # If we pass a name as parameter we need to change the output file name, the plot title
  # and subset the data.
  fname = sprintf("Performance.png")
  title = "Performance"
  if(length(name)!=0) {
    fname = sprintf("Performance-%s.png",name)
    title = paste(title," - ",name)
    data <- subset(data,Name == name)
  }
  byYear <- aggregate(Quantity ~ Year+Category, data, sum)
  byYear = ddply(byYear, "Year", mutate, label_y = cumsum(Quantity))
  png(filename=fname,width = 960, height = 640)
  g <- ggplot(byYear, aes(x=Year,y=Quantity))
  g <- g + geom_bar(stat="identity",aes(fill=Category), colour="black") + 
    ggtitle(title) +
      scale_fill_discrete("Category",labels=c("Sheep","Cactus","Chicken"),c=45, l=80)+
      scale_x_continuous(name="Year", limits=c(1996,2013), breaks=seq(1996,2013,1)) +
    mytheme+
    geom_text(aes(label=Quantity,y=label_y), vjust=1.3,size=6)
  print(g)
  dev.off()
}

If you want to reproduce the plots you will also need some constants for the theme:

# Colors and themes for the plot
goodBlue <- "#7fbfff"
darkBlue <- "#3f5f7f"
mytheme <- theme(plot.title = element_text(color=darkBlue,face="bold",size=20),
                   axis.title.x = element_text(color=darkBlue,face="bold",size=16),
                   axis.title.y = element_text(color=darkBlue,face="bold",size=16),
                   axis.text.x = element_text(color=darkBlue,face="bold",size=14),
                   axis.text.y = element_text(color=darkBlue,face="bold",size=12),
                   legend.title = element_text(color=darkBlue,face="bold",size=18),
                   legend.text = element_text(color=darkBlue,face="bold",size=12))

Calling

doPlot(data)

Gives the following plot:

Everything sheared by Alice and Bob

Not exactly what I want: note that the first category was labeled "Sheep"!

I cannot get the desired results when I filter and plot the data with a call to:

doPlot(data,"Alice")

Here is the plot: Alice sheared those Legends/colors are correct: all Alice ever sheared were sheeps.

What I wanted was:

  1. Ensure that every plot have the same caption, in the order I want them to appear (Sheeps, Cactus, Chicken) with the correct colors on the caption and bars;

  2. Ensure that the caption will appear with all entries even if they are not present in the data being plotted. E.g., in the second plot I will have the same legend as in the first (readers would notice that Cactus/Chicken were part of the data but Alice didn't sheared any).

thanks in advance

EDIT: I can solve item 1 by enforcing an order to the factors:

data$Category <- factor(data$Category, levels = c("SHEEP", "CACTUS", "CHICKEN"))

The first plot then becomes: Alice wisely avoid cacti

Community
  • 1
  • 1
Rafael Santos
  • 463
  • 5
  • 16

1 Answers1

1

Grr, found the answer. Just add DROP=FALSE to the line

scale_fill_discrete("Category",labels=c("Sheep","Cactus","Chicken"),drop=FALSE,c=45, l=80)

Here is the result. Answering myself so hopefully others may benefit.

Let's fire Bob and keep Alice

Rafael Santos
  • 463
  • 5
  • 16
  • It looks like http://stackoverflow.com/questions/10834382/ggplot2-keep-unused-levels-barplot has also answered the question. – whaleberg Feb 23 '16 at 21:29