2

I am writing R scripts which play just a small role in a chain of commands I am executing from a terminal. Basically, I do much of my data manipulation in a Python script and then pipe the output to my R script for plotting. So, from the terminal I execute commands which look something like $python whatever.py | R CMD BATCH do_some_plotting.R. This workflow has been working well for me so far, though I have now reached a point where I want to overlay multiple histograms on the same plot, inspired by this answer to another user's question on Stackoverflow.

Inside my R script, my plotting code looks like this:

pdf("my_output.pdf")
plot(hist(d$original,breaks="FD",prob=TRUE), col=rgb(0,0,1,1/4),xlim=c(0,4000),main="original - This plot is in beta")
plot(hist(d$minus_thirty_minutes,breaks="FD",prob=TRUE), col=rgb(1,0,0,1/4),add=T,xlim=c(0,4000),main="minus_thirty_minutes - This plot is in beta")

Notably, I am using add=T, which is presumably meant to specify that the second plot should be overlaid on top of the first. When my script has finished, the result I am getting is not two histograms overlaid on top of each other, but rather a 3-page PDF whose 3 individual plots contain the titles:

i) Histogram of d$original
ii) original - This plot is in beta
iii) Histogram of d$minus_thirty_minutes

So there's two points here I'm looking to clarify. Firstly, even if the plots weren't overlaid, I would expect just a 2-page PDF, not a 3-page PDF. Can someone explain why I am getting a 3-page PDF? Secondly, is there a correction I can make here somewhere to get just the two histograms plotted, and both of them on the same plot (i.e. 1-page PDF)?

The other Stackoverflow question/answer I linked to in the first paragraph did mention that alpha-blending isn't supported on all devices, and so I'm curious whether this has anything to do with it. Either way, it would be good to know if there is a R-based solution to my problem or whether I'm going to have to pipe my data into a different language/plotting engine.

Community
  • 1
  • 1
Bryce Thomas
  • 10,479
  • 26
  • 77
  • 126

1 Answers1

2

Your problem is that hist plots by default, returning invisibly a histogram object that can be plotted. So calling plot(hist(..)) plots the histogram twice (two pages in your pdf), using add=T in the second call means that a third, but not fourth plot is created. You can set plot = FALSE, or not wrap the calls to hist within plot().

To get this to work as a single page pdf (using the example from How to plot two histograms together in R?

set.seed(42)
p1 <- hist(rnorm(500,4), plot=  FALSE)                     # centered at 4
p2 <- hist(rnorm(500,6), plot = FALSE)    
                                             # centered at 6
pdf('foo.pdf')
plot( p1, col=rgb(0,0,1,1/4), xlim=c(0,10))  # first histogram
plot( p2, col=rgb(1,0,0,1/4), xlim=c(0,10), add=T)

dev.off()
Community
  • 1
  • 1
mnel
  • 113,303
  • 27
  • 265
  • 254