12

I have a dataframe that looks like this:

       person n start end
1         sam 6     0   6
2        greg 5     6  11
3     teacher 4    11  15
4         sam 4    15  19
5        greg 5    19  24
6       sally 5    24  29
7        greg 4    29  33
8         sam 3    33  36
9       sally 5    36  41
10 researcher 6    41  47
11       greg 6    47  53

Where start and end are times or durations (sam spoke from 0 to 6; greg from 6 to 11 etc.). n is how long (in this case # of words) the person spoke. I want to plot this as a time line in base R (I eventually may ask a similar question using ggplot2 but this answer is specific to base R [when I say base I mean the packages that come with a standard install]).

The y axis will be by person and the x axis will be time. Hopefully the final product looks something like this for the data above:

Timeline_Graph

I would like to use base R to make this. I'm not sure how to approach this. My thoughts are to use a dot plot and plot a dotplot but leave out the dots. Then go over this with square end segments. I'm not sure about how this will work since the segments need numeric x and y points to make the segments and the y axis is categorical. Another thought is to convert the factors to numeric (assign each factor a number) and plot as a blank scatterplot and then go over with square end line segments. This could be a powerful tool in my field looking at speech patterns.

I thank you in advance for your help.

PS the argument for square ended line segments is segments(... , lend=2) to save time looking this information up for those not familiar with all the segment arguments.

Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519

3 Answers3

31

You say you want a base R solution, but you don't say why. Since this is one line of code in ggplot, I show this anyway.

library(ggplot2)
ggplot(dat, aes(colour=person)) + 
    geom_segment(aes(x=start, xend=end, y=person, yend=person), size=3) +
    xlab("Duration")

enter image description here

Andrie
  • 176,377
  • 47
  • 447
  • 496
  • it sounds like he's avoided all other outside dependencies for a package he's developing, and is trying to keep it that way: http://stackoverflow.com/questions/9857787/collapse-columns-by-grouping-variable-in-base. – Chase Mar 25 '12 at 18:31
  • He also might want more control of the appearance, to understand base graphics better, to integrate it with other base graphics plots, or just have a preference. Oh, and this isn't really one line. You have minimally 2 lines there, easily interpreted as 3, and you forgot install.packages('ggplot2'). – John Mar 25 '12 at 18:41
  • 1
    @Andrie, that is very nice. The reason I don't want to use ggplot is as Chase notes, I have avoided all dependencies except wordcloud (as this package does some coding in C I am not capable of doing). That being said the function will plot but will also return a processed data frame that can be fed to ggplot (I plan on showing this as an example in my package though may have to use `#` to get the code to pass the package creation tests. The reason I'm interested in ggplot is because I will use the same idea for repeated measures and faceting will be nice here. Great work Andrie. +1 – Tyler Rinker Mar 25 '12 at 19:01
  • 2
    @TylerRinker Nice one. You have two options to include this in a package without introducing a dependency. 1) Use a `dontrun` block http://cran.r-project.org/doc/manuals/R-exts.html#index-g_t_005cdontrun-76 2) Use `suggests(ggplot2)` in your package `DESCRIPTION` and then `require(ggplot2)` in your example. In this way, the `ggplot2` package only gets loaded if the user actually wants to use it. – Andrie Mar 26 '12 at 06:46
  • @Andrie Thanks this is my first package for general consumer use (I've created 2 packages for myself before but am being as disciplined as I can for a first timer) and want to provide as quality of a product as I can. Thank you for the dependencies vs suggests info +1 – Tyler Rinker Mar 26 '12 at 06:51
  • Hi there, is there anyway to add text labels on the bars? – M.Qasim Sep 14 '15 at 21:37
17

Pretty similar to @John's approach, but since I did it, I will post it :)

Here's a generic function to plot a gantt (no dependencies):

plotGantt <- function(data, res.col='resources', 
                      start.col='start', end.col='end', res.colors=rainbow(30))
{
  #slightly enlarge Y axis margin to make space for labels
  op <- par('mar')
  par(mar = op + c(0,1.2,0,0)) 

  minval <- min(data[,start.col],na.rm=T)
  maxval <- max(data[,end.col],na.rm=T)

  res.colors <- rev(res.colors)
  resources <- sort(unique(data[,res.col]),decreasing=T)

  plot(c(minval,maxval),
       c(0.5,length(resources)+0.5),
       type='n', xlab='Duration',ylab=NA,yaxt='n' )
  axis(side=2,at=1:length(resources),labels=resources,las=1)
  for(i in 1:length(resources))
  {
    yTop <- i+0.1
    yBottom <- i-0.1
    subset <- data[data[,res.col] == resources[i],]
    for(r in 1:nrow(subset))
    {
      color <- res.colors[((i-1)%%length(res.colors))+1]
      start <- subset[r,start.col]
      end <- subset[r,end.col]
      rect(start,yBottom,end,yTop,col=color)
    }
  }
  par(mar=op) # reset the plotting margins
}

Usage example:

data <- read.table(text=
'"person","n","start","end"
"sam",6,0,6
"greg",5,6,11
"teacher",4,11,15
"sam",4,15,19
"greg",5,19,24
"sally",5,24,29
"greg",4,29,33
"sam",3,33,36
"sally",5,36,41
"researcher",6,41,47
"greg",6,47,53',sep=',',header=T)

plotGantt(data, res.col='person',start.col='start',end.col='end',
          res.colors=c('green','blue','brown','red','yellow'))

Result:

enter image description here

digEmAll
  • 56,430
  • 9
  • 115
  • 140
  • This answer also fulfilled the parameters I listed. It looks terrific as well. Thank you for sharing, a slightly different approach. +1 – Tyler Rinker Mar 25 '12 at 19:20
  • Also thank you for the word Gantt as well. I didn't know what it was called. – Tyler Rinker Mar 25 '12 at 19:22
  • 1
    @TylerRinker: You're welcome :). However I slightly changed the code to make space for the labels. – digEmAll Mar 25 '12 at 21:10
  • I want to include some of this work in a package. I want to properly cite you. Can you please contact me @ tyler.rinker@gmail.com – Tyler Rinker Sep 08 '12 at 22:21
  • 2
    @TylerRinker: thanks but there's no need to cite me for this little piece of code. Feel free to use it ;) – digEmAll Sep 10 '12 at 07:13
  • @digEmAll - hi, I have a follow-up question regarding how to plot categories on the y when they exist in the whole df but not in a subset you're plotting, and also the ability to consistently use colors for particular y-axis categories. I posted a question on this here- http://stackoverflow.com/questions/26374327/gantt-plot-in-base-r-modifying-plot-properties - hope you can help. thanks. – jalapic Oct 15 '14 at 16:44
8

While the y-axis is categorical all you need to do is assign numbers to the categories (1:5) and track them. Using the default as.numeric() of the factor will usually number them alphabetically but you should check anyway. Make your plot with the xaxt = 'n' argument. Then use the axis() command to put in a y-axis.

axis(2, 1:5, myLabels)

Keep in mind that whenever you're plotting the only way to place things is with a number. Categorical x or y values are always just the numbers 1:nCategories with category name labels in place of the numbers on the axis.

Something like the following gets you close enough (assuming your data.frame object is called datf)...

datf$pNum <- as.numeric(datf$person)
plot(datf$pNum, xlim = c(0, 53), type = 'n', yaxt = 'n', xlab ='Duration (words)', ylab = 'person', main = 'Speech Duration')
axis(2, 1:5, sort(unique(datf$person)), las = 2, cex.axis = 0.75)
with(datf, segments(start, pNum, end, pNum, lwd = 3, lend=2))
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
John
  • 23,360
  • 7
  • 57
  • 83