2

Is it possible to store the information generated from geom_histogram() in such a way that it can be retrieved at a latter stage? I want to calculated histograms of a large dataset, and to store it so that I can added another layer of information at a later stage.

I did previously think about using a pdf or jpeg to do this (and asked a question recently on the topic) but I think that it would be cleaner if I managed to use the actual data.

Community
  • 1
  • 1
djq
  • 14,810
  • 45
  • 122
  • 157
  • Do you mean storing the result of a single call to `geom_histogram` or do you mean you want to store the entire image of an actual histogram for later use? – joran Mar 30 '12 at 03:15
  • 4
    All ggplot2 calls can be saved to a variable. So long as the variable remains unchanged, you can add additional layers to it. `plot1 <- ggplot(x, aes(x,y))` then `plot1 + geom_... + opts() + ...` – Brandon Bertelsen Mar 30 '12 at 03:25
  • I realize that I can save the `ggplot2` call to a variable, but can I write that variable to disk and access it at a later stage? – djq Mar 30 '12 at 13:01
  • Do I have to store it as an `RData` object? I've used `save` before, I'm just not clear on how to save the `ggplot[8]` by itself (or how to read it back in). – djq Mar 30 '12 at 13:11

1 Answers1

3

I'm just moving my comment down as an answer...

All ggplot2 calls can be saved to a variable. So long as the variable remains intact, you can add additional layers to it. As with any other variable or environment - these plot variables can also be saved to a file for later use.

For example:

dat <- data.frame(x=rnorm(10000),y=rnorm(10000))
plot1 <- ggplot(dat, aes(x))
plot2 <- ggplot(dat, aes(y))

save(file="~/Plots.Rdata",list=ls()[grep("plot",ls())]) # Save vars named plot...

rm(plot1,plot2) # Remove

load("~/Plots.Rdata") # Reload Plots

plot1 + geom_histogram() # Add new layer later
plot2 + geom_histogram() # Add new layer later

UPDATE

In response to your comment, below, with respect to reducing the size:

You can convert your histogram into a density plot if you need it to be smaller. Note that you lose out on information when you do this and you're essentially just creating line plots of the density:

first.density <- density(dat$x) # Look at str(x.density) you'll see x and y 
second.density <- density(dat$y) # Look at str(y.density) you'll see x and y

dat1 <- data.frame(x=first.density$x,y=first.density$y)
dat2 <- data.frame(x=second.density$x,y=second.density$y) 

plot3 <- ggplot(dat1, aes(x,y))
plot4 <- ggplot(dat2, aes(x,y))

As you can see, the object size is significantly reduced:

object.size(plot1)
object.size(plot2)
object.size(plot3)
object.size(plot4)
Brandon Bertelsen
  • 43,807
  • 34
  • 160
  • 255
  • or, more simply: `save(file="~/Plots.Rdata",list=c("plot1","plot2"))` – Brandon Bertelsen Mar 30 '12 at 13:20
  • nice one! Thanks; for some reason the type of variable was throwing me off - this clarifies it greatly! – djq Mar 30 '12 at 13:43
  • Do you know if there is any way of reducing the file size through smoothing? My stored plot is still 2mb (maybe that is reasonable), but I'm wondering if I could shrink it through smoothing the line. – djq Mar 30 '12 at 15:11
  • 1
    I've updated the answer with an example of converting the histogram to a line plot of pre-processed density. This reduces the size significantly, but also removes a significant amount of information from your saved plot. ggplot2 also has a density geom, but you'd still be saving the data along with the plot. – Brandon Bertelsen Mar 30 '12 at 16:20