7

I would like present propensity score matching stats on 2 groups (unmatched in BW, matched in color) and would like to use mirrored histograms like the following

Is it possible to overlay 4 different histograms in base R? Is there any package providing this functionality?

The x-axis is [0,1] bounded (it is a probability) and the BW columns are always bigger or equal to the colored columns (that is there cannot be BW columns "behind" the colored columns).

Image from http://www.ncbi.nlm.nih.gov/pubmed/22244556

ECII
  • 10,297
  • 18
  • 80
  • 121
  • You can overlay base graphics with `par(new=TRUE)`. That's a start. And [this answer to another question](http://stackoverflow.com/a/3557042/3005513) might be helpful. – Alex A. Apr 28 '15 at 20:07
  • Thanks. ´plot.histogram()´ also supports add=T. The tricky part is the "negative" histograms. – ECII Apr 28 '15 at 20:15
  • Yeah. I'm sure `ggplot` has some way to do it... – Alex A. Apr 28 '15 at 20:17
  • I am wondering if there is a more primitive way to plot the bars directly from the hist() object. Then it could be possible to plot the "negative" histogram. – ECII Apr 28 '15 at 20:20
  • In case my answer was not clear, I added a function that shows how one would take the approach I illustrated. – dayne Apr 29 '15 at 14:50

2 Answers2

9

You can use something like the following. You would want to pre-calculate the hist objects to get the correct ylim values, then use axis and mtext or title to properly label your graph.

set.seed(1234)
x <- rnorm(100, 0, 1)

plot.new()
plot.window(ylim = c(-40, 40), xlim = range(x))
p <- list(axes = FALSE, xlab = "", ylab = "", main = "")
par(new = TRUE)  
do.call(hist, c(list(x = x, ylim = c(-40, 40)), p))
par(new = TRUE)
do.call(hist, c(list(x = x, ylim = c(40, -40)), p))
axis(side = 2, 
     at = pretty(par()$usr[3:4]), 
     labels = abs(pretty(par()$usr[3:4])))
axis(side = 1)

enter image description here

EDIT

## Create some fake data
set.seed(1234)
d <- rnorm(250, 0, 1)
e <- rnorm(250, 1, 1)
f <- rlnorm(100, 0, .2)
g <- rlnorm(100, 1, .2)

## Function for plotting
multhist <- function(..., bin.width, col, dir, xlab = NULL, ylab = NULL,
                     main = NULL) {

  vals <- list(...)
  vrng <- range(vals)

  brks <- seq(vrng[1] - abs(vrng[1]*0.1), 
              vrng[2] + abs(vrng[2]*0.1), 
              by = bin.width)

  yrng <- max(sapply(lapply(vals, hist, breaks = brks), "[[", "counts"))
  yrng <- 1.2*c(-1*yrng, yrng)

  plot.new()
  plot.window(ylim = yrng, xlim = vrng)

  addhist <- function(x, col, dir) {
    par(new = TRUE)  
    hist(x = x, ylim = dir*yrng, col = col, xlab = "", ylab = "", 
         main = "", axes = FALSE, breaks = brks)
  }

  mapply(addhist, x = vals, col = col, dir = dir)

  py <- pretty(yrng)
  py <- py[py >= yrng[1] & py <= yrng[2]]
  axis(side = 2, at = py, labels = abs(py))
  axis(side = 1)
  title(main = main, xlab = xlab, ylab = ylab)

}

You can give the function numeric vectors, as well as vectors for the corresponding colors and directions (1 or -1). I did not do the formal checking on the lengths of vals, col, and dir, but it is pretty straight forward.

## Use the function
multhist(d, e, f, g, bin.width = 0.5, 
         col = c("white", "white", "lightgreen", "darkgreen"), 
         dir = c(1, -1, 1, -1), xlab = "xlabel", ylab = "ylabel", 
         main = "title")

enter image description here

dayne
  • 7,504
  • 6
  • 38
  • 56
4

You can use base barplot twice, you just need to compute negative values for the down plot e.g. :

upvalues <- data.frame(Green=c(0,10,90,140,30, 20),White=c(70,50,30,20,5, 0))
downvalues <- data.frame(Green=c(-80,-70,-10,-60,-10,-10),White=c(0,-100,-60,-5,-15, -10))

# barplot accept only matrices
up <- t(as.matrix(upvalues))
down <- t(as.matrix(downvalues))

# empty plot to make room for everything
yspace <- 1.2 # it means 20% of space for the inside labels
plot(x=c(0,max(ncol(up))),
     y=c(min(colSums(down))*yspace ,max(colSums(up))*yspace), 
     type='n', xaxt='n', yaxt='n', ann=FALSE)
a <- barplot(up,col=c('green','white'),space=0,add=T)
b <- barplot(down,col=c('darkgreen','white'),space=0,add=T,axes=F)

axis(side=1,at=seq.int(0,by=ncol(up)/10,length.out=11),
     labels=seq.int(0,by=10,length.out=11))
title(xlab = 'x label', ylab='y label', main = 'Title')
mtext('UP',line=-1,side=3,col='green')
mtext('DOWN',line=-1,side=1,col='darkgreen')

enter image description here

digEmAll
  • 56,430
  • 9
  • 115
  • 140
  • Is it possible to get it to work with different for different lengths of Green and White vectors in both upvalues and downvalues? Perhaps get it to use a hist object for cosistency? – ECII Apr 29 '15 at 06:37
  • This example starts from predefined and aligned bar values. If you preprocess your input (having different lengths) maybe using hist objects, you can arrive to this format and then use the posted code. – digEmAll Apr 29 '15 at 10:00
  • Using this approach would create invalid x-values. ie. using a categorical plotting function for continuous data is not a safe/accurate approach. – dayne Apr 29 '15 at 14:58
  • Can you please elaborate a bit more on that @dayne – ECII Apr 29 '15 at 20:45
  • With this approach the x-values end up getting arbitrarily added. `barplot` is intended to graph frequencies by category (not by a continuous value). – dayne Apr 30 '15 at 01:24
  • Maybe it' me but actually, I don't see the difference with the histogram approach. I mean, when you calculate the histograms you get positions and heights of the "bars", so you basically obtain categorical data and you can use this approach... – digEmAll Apr 30 '15 at 07:14
  • @digEmAll The difference, in my mind, is that you cannot specify the location of the bars in `barplot` -- only the width and the spacing. – dayne Apr 30 '15 at 20:19
  • Ok, now I see what you mean. Well, yes you can't specify the position but actually, since I assume the OP wants equally spaced bars, once you know the number of bars you want to divide the 0-100 range, you can easily calculate the heights and you don't need any x-position at all, since barplot will do the work for you. Yeah, it's maybe more complicated than your approach for continuous data, but since the OP hasn't provided any input I just recreated a plot with similar appearance ;) – digEmAll Apr 30 '15 at 22:04