4

I would like to create a horizontal ‘stacked bar’ type plot in which date runs along the x-axis and my samples appear as bars on the y-axis. In the simple example below, I have three samples (a, b, c) each containing three values (0, 1, 2). I would like the horizontal bar to be coloured based on the value at each time step along the x-axis, so that I end up with three horizontal bars (one for each sample) that run from my first to last time point and contain a series of blocks with colours that relate to the different values.

For example, say I want value 0 to be blue, value 1 to be yellow and value 2 to be red: for sample a, the first two days of the trace would be blue, then the next two days would be yellow, followed by a single blue and so on……

Example data:

df <- structure(list(date = c("30/04/2011", "01/05/2011", "02/05/2011", "03/05/2011", "04/05/2011", "05/05/2011", "06/05/2011", "07/05/2011", "08/05/2011", "09/05/2011", "10/05/2011", "11/05/2011", "12/05/2011", "13/05/2011", "14/05/2011", "15/05/2011", "16/05/2011", "17/05/2011", "18/05/2011", "19/05/2011", "20/05/2011", "21/05/2011", "22/05/2011", "23/05/2011", "24/05/2011", "25/05/2011", "26/05/2011", "27/05/2011", "28/05/2011", "29/05/2011", "30/05/2011", "31/05/2011", "01/06/2011", "02/06/2011", "03/06/2011", "04/06/2011", "05/06/2011", "06/06/2011", "07/06/2011", "08/06/2011", "09/06/2011", "10/06/2011", "11/06/2011", "12/06/2011", "13/06/2011", "14/06/2011", "15/06/2011", "16/06/2011", "17/06/2011", "18/06/2011", "19/06/2011", "20/06/2011", "21/06/2011", "22/06/2011", "23/06/2011", "24/06/2011", "25/06/2011", "26/06/2011", "27/06/2011", "28/06/2011", "29/06/2011", "30/06/2011", "01/07/2011", "02/07/2011", "03/07/2011", "04/07/2011", "05/07/2011", "06/07/2011", "07/07/2011", "08/07/2011", "09/07/2011", "10/07/2011", "11/07/2011", "12/07/2011", "13/07/2011", "14/07/2011", "15/07/2011", "16/07/2011", "17/07/2011", "18/07/2011", "19/07/2011", "20/07/2011", "21/07/2011", "22/07/2011", "23/07/2011", "24/07/2011"), a = c(0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L), b = c(0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L), c = c(1L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L)), .Names = c("date", "a", "b", "c"), class = "data.frame", row.names = c(NA, -86L))

head(df)
#         date a b c
# 1 30/04/2011 0 0 1
# 2 01/05/2011 0 1 1
# 3 02/05/2011 1 1 0
# 4 03/05/2011 1 0 0
# 5 04/05/2011 0 0 0

This must be a really easy thing to achieve but I can’t get my head around it (i.e. bar plot doesn't seem to work for this). Any help would be appreciated. Thanks!

dayne
  • 7,504
  • 6
  • 38
  • 56
jjulip
  • 1,093
  • 4
  • 16
  • 24
  • I am not sure how to interpret your data, because if you are defining a value between two dates, the first or last value should really be NA. – dayne Apr 29 '15 at 19:52

3 Answers3

3

For a ggplot2 plot first convert df to long form (using melt from the reshape2 package), convert the date column to "Date" class and the value column to a factor and then use geom_tile:

library(ggplot2)
library(reshape2)

long <- melt(df, measure.var = 2:4)
long <- transform(long, date = as.Date(long$date, "%d/%m/%Y"), value = factor(value))

ggplot(long, aes(date, variable)) + 
   geom_tile(aes(fill = value)) + 
   scale_fill_manual(values = c("blue", "yellow", "red"))

screenshot

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • This is nice. If you cannot tell from my answer I am resistant to the ggplot2 package, but this is a pretty convincing use case. Does it treat each date individually? (Rather than my answer that fills from date to date.) – dayne Apr 30 '15 at 01:35
  • `geom_tile` tiles the area with colored blocks as shown. – G. Grothendieck Apr 30 '15 at 01:47
  • @ G. Grothendieck. Thank you for this. Great that it centres the block on each date. Not being a ggplot user, is there an easy way change the defaults so that the plot resembles one made in base graphics (i.e. so that there are spaces between individual bars, no grayscale background and black axes?). Many thanks! – jjulip Apr 30 '15 at 08:24
  • 1
    I managed to find the solutions to my ggplot queries above by adding `height` to ggplot line to increase spaces between bars `ggplot(long, aes(date, variable, height=0.7))` and using the `theme_bw()` commands to remove grey background / grids etc. I have accepted this answer because it centres each block on the date rather than between dates but wish I could accept them all! Thank you all for your very helpful responses. – jjulip Apr 30 '15 at 09:31
  • You might want to write it like this with the `height` outside of `aes`: `geom_tile(aes(fill = value), height = 0.7)` . – G. Grothendieck May 06 '15 at 04:15
2

This is very manual, but I think it answers your question. To my knowledge there is not a function that does this for you -- but I could very likely be wrong. I just used polygon to draw the boxes for each group. NOTE: You need to change your date field to a Date class.

dat$date <- as.Date(dat$date, "%d/%m/%Y")

plot(dat$a~dat$date, type = "n", yaxt = "n", ylab = "", 
     xlab = "", bty = "n", ylim = c(0, 4))
draw.box <- function(y, x1, x2, h, col) {
  polygon(x = c(x1, x1, x2, x2), 
          y = c(y - h/2, y + h/2, y + h/2, y - h/2),
          col = col, border = col)
}

for (j in c("a", "b", "c")) {
  for (i in 2:nrow(dat)) {
    bcol <- switch(as.character(dat[(i - 1), j]),
                   "0" = "red",
                   "1" = "blue",
                   "2" = "yellow")
    yloc <- switch(j,
                   "a" = 3,
                   "b" = 2,
                   "c" = 1)
    draw.box(y = yloc, 
             h = 0.75, 
             col = bcol, 
             x1 = dat[(i - 1), "date"], 
             x2 = dat[i, "date"])
  }
}

axis(side = 2, at = 3:1, labels = c("A", "B", "C"), 
     tick = FALSE, las = 2)

enter image description here

Here the last value is not plotted, because there is no "end date" to bound the bar.

dayne
  • 7,504
  • 6
  • 38
  • 56
  • @ dayne: This is excellent thank you! I want the data to specifically relate to the dates that they are assigned to in the df, rather than falling between two dates, but I think that this can be achieved quite easily by shifting the axis labels to centre on the bar rather than the tick. Thanks for your help! – jjulip Apr 29 '15 at 20:45
2

I was able to get barplot() to work here, but man, I had to jump through some hoops.

First, barplot() requires a matrix of bar segment lengths, which means we have to get the run lengths of the continuous stretches of color from your input data to define these lengths (note: see end of answer for an alternative that treats each data point as a separate segment). We also need to capture which colors apply to each run length, which, fortunately, rle() is perfect for, as it captures both run lengths and values in a two-component list.

Second, barplot() has an unfortunate limitation regarding coloring of stacked bars. Namely, if you provide a normal-looking intuitively-structured matrix with two or more stacked bars (meaning two or more columns) to the height parameter, and you want to color each stacked bar using a different sequence of colors from other stacked bars, then you won't be able to. At least, not with that matrix structure.

This is because the col argument can only accept a vector of colors; it can't accept a matrix or a list of vectors or anything else to correspond with the primary matrix input that is passed to the height parameter. If you try to provide an overly long vector of colors, barplot() ignores the excess.

Based on Stacked bar plot with different combinations of colors in R, the solution is to offset each bar within the matrix, setting all adjacent columns to zero, thus allowing you to set a different color for every bar segment in every bar.

It was not easy to massage the data into the desired shape, but with the help of @akrun's answer from a question I just asked a moment ago, How to rbind vectors into different columns, leaving NAs in remaining cells, we can accomplish all of this as follows:

pd <- lapply(df[-1],function(v) do.call(cbind,rle(v)));
height <- as.matrix(setNames(reshape(cbind(id=1:sum(sapply(pd,nrow)),stack(lapply(pd,function(x) x[,'lengths']))),dir='w',timevar='ind')[-1],names(pd)));
height[is.na(height)] <- 0;
col <- c('blue','yellow','red')[do.call(c,sapply(pd,function(x) x[,'values']))+1];
barplot(t(apply(height,1,rev)),col=col,horiz=T,axes=F);
axis(1,0:(nrow(df)-1),labels=df$date);
title('Horizontal Stacked Bar Plot');

barplot

And here's the data, for reference:

pd;
## $a
##       lengths values
##  [1,]       2      0
##  [2,]       2      1
##  [3,]       1      0
##  [4,]       1      1
##  [5,]       3      0
##  [6,]       1      1
##  [7,]       3      0
##  [8,]       1      1
##  [9,]      13      0
## [10,]      22      2
## [11,]      12      0
## [12,]       4      1
## [13,]       3      0
## [14,]       2      1
## [15,]       3      0
## [16,]       2      1
## [17,]       1      0
## [18,]       1      1
## [19,]       8      0
## [20,]       1      1
##
## $b
##       lengths values
##  [1,]       1      0
##  [2,]       2      1
##  [3,]       4      0
##  [4,]       2      1
##  [5,]       3      0
##  [6,]       1      1
##  [7,]       9      0
##  [8,]      22      2
##  [9,]       3      0
## [10,]       1      1
## [11,]      10      0
## [12,]       1      1
## [13,]       7      0
## [14,]       3      1
## [15,]       5      0
## [16,]       2      1
## [17,]       5      0
## [18,]       5      1
##
## $c
##       lengths values
##  [1,]       2      1
##  [2,]       3      0
##  [3,]       1      1
##  [4,]       1      0
##  [5,]       1      1
##  [6,]       1      0
##  [7,]       1      1
##  [8,]       1      0
##  [9,]       1      1
## [10,]      13      0
## [11,]      30      2
## [12,]      16      0
## [13,]       1      1
## [14,]       7      0
## [15,]       3      1
## [16,]       4      0
##
height;
##     a  b  c
## 1   2  0  0
## 2   2  0  0
## 3   1  0  0
## 4   1  0  0
## 5   3  0  0
## 6   1  0  0
## 7   3  0  0
## 8   1  0  0
## 9  13  0  0
## 10 22  0  0
## 11 12  0  0
## 12  4  0  0
## 13  3  0  0
## 14  2  0  0
## 15  3  0  0
## 16  2  0  0
## 17  1  0  0
## 18  1  0  0
## 19  8  0  0
## 20  1  0  0
## 21  0  1  0
## 22  0  2  0
## 23  0  4  0
## 24  0  2  0
## 25  0  3  0
## 26  0  1  0
## 27  0  9  0
## 28  0 22  0
## 29  0  3  0
## 30  0  1  0
## 31  0 10  0
## 32  0  1  0
## 33  0  7  0
## 34  0  3  0
## 35  0  5  0
## 36  0  2  0
## 37  0  5  0
## 38  0  5  0
## 39  0  0  2
## 40  0  0  3
## 41  0  0  1
## 42  0  0  1
## 43  0  0  1
## 44  0  0  1
## 45  0  0  1
## 46  0  0  1
## 47  0  0  1
## 48  0  0 13
## 49  0  0 30
## 50  0  0 16
## 51  0  0  1
## 52  0  0  7
## 53  0  0  3
## 54  0  0  4
col;
##  [1] "blue"   "yellow" "blue"   "yellow" "blue"   "yellow" "blue"   "yellow" "blue"   "red"    "blue"   "yellow" "blue"   "yellow" "blue"   "yellow" "blue"   "yellow" "blue"   "yellow" "blue"   "yellow" "blue"
## [24] "yellow" "blue"   "yellow" "blue"   "red"    "blue"   "yellow" "blue"   "yellow" "blue"   "yellow" "blue"   "yellow" "blue"   "yellow" "yellow" "blue"   "yellow" "blue"   "yellow" "blue"   "yellow" "blue"
## [47] "yellow" "blue"   "red"    "blue"   "yellow" "blue"   "yellow" "blue"

Lastly, I did attempt to construct the plot without the run length step, and instead just treat each data point as its own segment. This works (although you still have to do the offset thing), but may not be what you want. Here's a screenshot of what it looks like:

barplot-separated

And here's the code, in case you like this better:

pd <- lapply(df[-1],function(v) rep(1,length(v)));
height <- as.matrix(setNames(reshape(cbind(id=1:sum(sapply(pd,length)),stack(lapply(pd,function(x) x))),dir='w',timevar='ind')[-1],names(pd)));
height[is.na(height)] <- 0;
col <- c('blue','yellow','red')[do.call(c,df[-1]+1)];
barplot(t(apply(height,1,rev)),col=col,horiz=T,axes=F);
axis(1,0:(nrow(df)-1),labels=df$date);
title('Horizontal Stacked Bar Plot');
Community
  • 1
  • 1
bgoldst
  • 34,190
  • 6
  • 38
  • 64
  • This is also excellent. Thank you! I will try to convert this so that x-axis labels align with bars rather than ticks. Thanks for the nice example plots. – jjulip Apr 29 '15 at 21:05