1

I would like to ask a follow-up question related to the answer given in this post [Gantt style time line plot (in base R) ] on Gantt plots in base r. I feel like this is worth a new question as I think these plots have a broad appeal. I'm also hoping that a new question would attract more attention. I also feel like I need more space than the comments of that question to be specific.

The following code was given by @digEmAll . It takes a dataframe with columns referring to a start time, end time, and grouping variable and turns that into a Gantt plot. I have modified @digEmAll 's function very slightly to get the bars/segments in the Gantt plot to be contiguous to one another rather than having a gap. Here it is:

plotGantt <- function(data, res.col='resources', 
                      start.col='start', end.col='end', res.colors=rainbow(30))
{
  #slightly enlarge Y axis margin to make space for labels
  op <- par('mar')
  par(mar = op + c(0,1.2,0,0)) 

  minval <- min(data[,start.col])
  maxval <- max(data[,end.col])

  res.colors <- rev(res.colors)
 resources <- sort(unique(data[,res.col]),decreasing=T)


  plot(c(minval,maxval),
       c(0.5,length(resources)+0.5),
       type='n', xlab='Duration',ylab=NA,yaxt='n' )
  axis(side=2,at=1:length(resources),labels=resources,las=1)
  for(i in 1:length(resources))
  {
    yTop <- i+0.5
    yBottom <- i-0.5
    subset <- data[data[,res.col] == resources[i],]
    for(r in 1:nrow(subset))
    {
      color <- res.colors[((i-1)%%length(res.colors))+1]
      start <- subset[r,start.col]
      end <- subset[r,end.col]
      rect(start,yBottom,end,yTop,col=color)
    }
  }
  par(op) # reset the plotting margins
}

Here are some sample data. You will notice that I have four groups 1-4. However, not all dataframes have all four groups. Some only have two, some only have 3.

mydf1 <- data.frame(startyear=2000:2009, endyear=2001:2010, group=c(1,1,1,1,2,2,2,1,1,1))
mydf2 <- data.frame(startyear=2000:2009, endyear=2001:2010, group=c(1,1,2,2,3,4,3,2,1,1))
mydf3 <- data.frame(startyear=2000:2009, endyear=2001:2010, group=c(4,4,4,4,4,4,3,2,3,3))
mydf4 <- data.frame(startyear=2000:2009, endyear=2001:2010, group=c(1,1,1,2,3,3,3,2,1,1))

Here I run the above function, but specify four colors for plotting:

plotGantt(mydf1, res.col='group', start.col='startyear', end.col='endyear', 
          res.colors=c('red','orange','yellow','gray99'))

plotGantt(mydf2, res.col='group', start.col='startyear', end.col='endyear', 
          res.colors=c('red','orange','yellow','gray99'))

plotGantt(mydf3, res.col='group', start.col='startyear', end.col='endyear', 
          res.colors=c('red','orange','yellow','gray99'))

plotGantt(mydf4, res.col='group', start.col='startyear', end.col='endyear', 
          res.colors=c('red','orange','yellow','gray99'))

These are the plots:

My Gantt plots

What I would like to do is modify the function so that:

1) it will plot on the y-axis all four groups regardless of whether they actually appear in the data or not.

2) Have the same color associated with each group for every plot regardless of how many groups there are. As you can see, mydf2 has four groups and all four colors are plotted (1-red, 2-orange, 3-yellow, 4-gray). These colors are actually plotted with the same groups for mydf3 as that only contains groups 2,3,4 and the colors are picked in reverse order. However mydf1 and mydf4 have different colors plotted for each group as they do not have any group 4's. Gray is still the first color chosen but now it is used for the lowest occurring group (group2 in mydf1 and group3 in mydf3).

It appears to me that the main thing I need to work on is the vector 'resources' inside the function, and have that not just contain the unique groups but all. When I try manually overriding to make sure it contains all the groups, e.g. doing something as simple as resources <-as.factor(1:4) then I get an error:

'Error in rect(start, yBottom, end, yTop, col = color) : cannot mix zero-length and non-zero-    length coordinates' 

Presumably the for loop does not know how to plot data that do not exist for groups that don't exist.

I hope that this is a replicable/readable question and it's clear what I'm trying to do.

EDIT: I realize that to solve the color problem, I could just specify the colors for the 3 groups that exist in each of these sample dfs. However, my intention is to use this plot as an output to a function whereby it wouldn't be known ahead of time if all of the groups exist for a particular df.

Community
  • 1
  • 1
jalapic
  • 13,792
  • 8
  • 57
  • 87
  • There are no Gantt plots in "base R". The closest would be `gantt.chart` in package plotrix. – IRTFM Oct 15 '14 at 05:03
  • @BondedDust did you even read past the first sentence? – shadowtalker Oct 15 '14 at 05:14
  • 2
    I did. I offered a reference to an implementation. The request for a function that would be able to "know" about objects that were not being passed to it seemed nonsensical. – IRTFM Oct 15 '14 at 05:18
  • I'd revise this question as it is not minimal. Right now you've included so much code and info that it's difficult to figure out what you want without spending a great deal of time parsing the question. – Tyler Rinker Oct 15 '14 at 12:03

1 Answers1

2

I slightly modified your function to account for NA in start and end dates :

plotGantt <- function(data, res.col='resources', 
                      start.col='start', end.col='end', res.colors=rainbow(30))
{
  #slightly enlarge Y axis margin to make space for labels
  op <- par('mar')
  par(mar = op + c(0,1.2,0,0)) 

  minval <- min(data[,start.col],na.rm=T)
  maxval <- max(data[,end.col],na.rm=T)

  res.colors <- rev(res.colors)
  resources <- sort(unique(data[,res.col]),decreasing=T)


  plot(c(minval,maxval),
       c(0.5,length(resources)+0.5),
       type='n', xlab='Duration',ylab=NA,yaxt='n' )
  axis(side=2,at=1:length(resources),labels=resources,las=1)
  for(i in 1:length(resources))
  {
    yTop <- i+0.5
    yBottom <- i-0.5
    subset <- data[data[,res.col] == resources[i],]
    for(r in 1:nrow(subset))
    {
      color <- res.colors[((i-1)%%length(res.colors))+1]
      start <- subset[r,start.col]
      end <- subset[r,end.col]
      rect(start,yBottom,end,yTop,col=color)
    }
  }
  par(mar=op) # reset the plotting margins
  invisible()
}

In this way, if you simply append all your possible group values to your data you'll get them printed on the y axis. e.g. :

mydf1 <- data.frame(startyear=2000:2009, endyear=2001:2010,
                    group=c(1,1,1,1,2,2,2,1,1,1))
# add all the group values you want to print with NA dates
mydf1 <- rbind(mydf1,data.frame(startyear=NA,endyear=NA,group=1:4))

plotGantt(mydf1, res.col='group', start.col='startyear', end.col='endyear', 
          res.colors=c('red','orange','yellow','gray99'))

enter image description here

About the colors, at the moment the ordered res.colors are applied to the sorted groups; so the 1st color in res.colors is applied to 1st (sorted) group and so on...

digEmAll
  • 56,430
  • 9
  • 115
  • 140