I would like to ask a follow-up question related to the answer given in this post [Gantt style time line plot (in base R) ] on Gantt plots in base r. I feel like this is worth a new question as I think these plots have a broad appeal. I'm also hoping that a new question would attract more attention. I also feel like I need more space than the comments of that question to be specific.
The following code was given by @digEmAll . It takes a dataframe with columns referring to a start time, end time, and grouping variable and turns that into a Gantt plot. I have modified @digEmAll 's function very slightly to get the bars/segments in the Gantt plot to be contiguous to one another rather than having a gap. Here it is:
plotGantt <- function(data, res.col='resources',
start.col='start', end.col='end', res.colors=rainbow(30))
{
#slightly enlarge Y axis margin to make space for labels
op <- par('mar')
par(mar = op + c(0,1.2,0,0))
minval <- min(data[,start.col])
maxval <- max(data[,end.col])
res.colors <- rev(res.colors)
resources <- sort(unique(data[,res.col]),decreasing=T)
plot(c(minval,maxval),
c(0.5,length(resources)+0.5),
type='n', xlab='Duration',ylab=NA,yaxt='n' )
axis(side=2,at=1:length(resources),labels=resources,las=1)
for(i in 1:length(resources))
{
yTop <- i+0.5
yBottom <- i-0.5
subset <- data[data[,res.col] == resources[i],]
for(r in 1:nrow(subset))
{
color <- res.colors[((i-1)%%length(res.colors))+1]
start <- subset[r,start.col]
end <- subset[r,end.col]
rect(start,yBottom,end,yTop,col=color)
}
}
par(op) # reset the plotting margins
}
Here are some sample data. You will notice that I have four groups 1-4. However, not all dataframes have all four groups. Some only have two, some only have 3.
mydf1 <- data.frame(startyear=2000:2009, endyear=2001:2010, group=c(1,1,1,1,2,2,2,1,1,1))
mydf2 <- data.frame(startyear=2000:2009, endyear=2001:2010, group=c(1,1,2,2,3,4,3,2,1,1))
mydf3 <- data.frame(startyear=2000:2009, endyear=2001:2010, group=c(4,4,4,4,4,4,3,2,3,3))
mydf4 <- data.frame(startyear=2000:2009, endyear=2001:2010, group=c(1,1,1,2,3,3,3,2,1,1))
Here I run the above function, but specify four colors for plotting:
plotGantt(mydf1, res.col='group', start.col='startyear', end.col='endyear',
res.colors=c('red','orange','yellow','gray99'))
plotGantt(mydf2, res.col='group', start.col='startyear', end.col='endyear',
res.colors=c('red','orange','yellow','gray99'))
plotGantt(mydf3, res.col='group', start.col='startyear', end.col='endyear',
res.colors=c('red','orange','yellow','gray99'))
plotGantt(mydf4, res.col='group', start.col='startyear', end.col='endyear',
res.colors=c('red','orange','yellow','gray99'))
These are the plots:
What I would like to do is modify the function so that:
1) it will plot on the y-axis all four groups regardless of whether they actually appear in the data or not.
2) Have the same color associated with each group for every plot regardless of how many groups there are. As you can see, mydf2 has four groups and all four colors are plotted (1-red, 2-orange, 3-yellow, 4-gray). These colors are actually plotted with the same groups for mydf3 as that only contains groups 2,3,4 and the colors are picked in reverse order. However mydf1 and mydf4 have different colors plotted for each group as they do not have any group 4's. Gray is still the first color chosen but now it is used for the lowest occurring group (group2 in mydf1 and group3 in mydf3).
It appears to me that the main thing I need to work on is the vector 'resources' inside the function, and have that not just contain the unique groups but all. When I try manually overriding to make sure it contains all the groups, e.g. doing something as simple as resources <-as.factor(1:4)
then I get an error:
'Error in rect(start, yBottom, end, yTop, col = color) : cannot mix zero-length and non-zero- length coordinates'
Presumably the for loop does not know how to plot data that do not exist for groups that don't exist.
I hope that this is a replicable/readable question and it's clear what I'm trying to do.
EDIT: I realize that to solve the color problem, I could just specify the colors for the 3 groups that exist in each of these sample dfs. However, my intention is to use this plot as an output to a function whereby it wouldn't be known ahead of time if all of the groups exist for a particular df.