8

Matplotlib allows to rasterize individual elements of a plot and save it as a mixed pixel/vector graphic (.pdf) (see e.g. this answer). How can the same achieved in R with ggplot2?


The following is a toy problem in which I would like to rasterize only the geom_point layer.

set.seed(1)
x <- rlnorm(10000,4)
y <- 1+rpois(length(x),lambda=x/10+1/x)
z <- sample(letters[1:2],length(x), replace=TRUE)

p <- ggplot(data.frame(x,y,z),aes(x=x,y=y)) +
  facet_wrap("z") +
  geom_point(size=0.1,alpha=0.1) +
  scale_x_log10()+scale_y_log10() +
  geom_smooth(method="gam",formula = y ~ s(x, bs = "cs"))
print(p)
ggsave("out.pdf", p)

When saved as .pdf as is, Adobe reader DC needs ~1s to render the figure. Below you can see a .png version: out.png

Of course, it is often possible to avoid the problem by not plotting raw data

Community
  • 1
  • 1
jan-glx
  • 7,611
  • 2
  • 43
  • 63
  • This can be a real problem: Consider the two versions of a scientific article: https://arxiv.org/abs/1501.01332v2 (all figures vector) vs https://arxiv.org/abs/1501.01332v3 (all figures rasterized). The first one may jam your printer or pdf viewer while the second is not as sharp while having a much larger file size. – jan-glx Nov 10 '17 at 12:39
  • 1
    https://stackoverflow.com/a/42059772 – baptiste Nov 11 '17 at 08:36
  • 4
    As a workaround, saving the entire plot as png with `dpi=600` or even `dpi=1200` should provide reasonably sharp raster images without generating huge files. png was specifically designed for line graphics. – Claus Wilke Dec 10 '17 at 18:38
  • @YAK Have you got an example plot that you wish to save? My immediate thought is to recommend you take a look at grConvert and grImport2. https://www.stat.auckland.ac.nz/~paul/R/grImport2/grImport2.pdf. An example plot would help as we could map to your example. – Technophobe01 Dec 12 '17 at 03:15
  • @Technophobe01 added one now. Interesting suggestion to modify the resulting vector graphic (if that is what you meant). How would you go about preserving the layer affiliation? – jan-glx Dec 13 '17 at 13:54

2 Answers2

14

Thanks to the ggrastr package by Viktor Petukhov & Evan Biederstedt, it is now possible to rasterize individual layers. However, currently (2018-08-13), only geom_point and geom_tile are supported. and work by Teun van den Brand it is now possible to rasterize any individual ggplot layer by wrapping it in ggrastr::rasterise():

# install.packages('devtools')
# remotes::install_github('VPetukhov/ggrastr')

df %>% ggplot(aes(x=x, y=y)) +
      # this layer will be rasterized:
      ggrastr::rasterise(geom_point(size=0.1, alpha=0.1)) +
      # this one will still be "vector":
      geom_smooth()

Previously, only a few geoms were supported: To use it, you had to replace geom_point by ggrastr::geom_point_rast.

For example:

# install.packages('devtools')
# devtools::install_github('VPetukhov/ggrastr')
library(ggplot2)

set.seed(1)
x <- rlnorm(10000, 4)
y <- 1+rpois(length(x), lambda = x/10+1/x)
z <- sample(letters[1:2], length(x), replace = TRUE)

ggplot(data.frame(x, y, z), aes(x=x, y=y)) +
  facet_wrap("z") +
  ggrastr::geom_point_rast(size=0.1, alpha=0.1) +
  scale_x_log10() + scale_y_log10() +
  geom_smooth(method="gam", formula = y ~ s(x, bs = "cs"))
ggsave("out.pdf")

This yields a pdf that contains only the geom_point layer as raster and everything else as vector graphic. Overall the figure looks as the one in the question, but zooming in reveals the difference: zoom-in view of example picture Compare this to an all-raster graphic: all-raster for comparison

jan-glx
  • 7,611
  • 2
  • 43
  • 63
4

I think you've set yourself up to not have this question answered. You write:

I expect an answer to provide an extension to ggplot2 that allows to export plots with rasterized layers with minimal changes to to existing plotting commands, i.e. as wrapper for geom_... commands or as an additional parameter to these or a ggsave command that expects a list of unevaluated parts of a plot command (every second to be rasterized), not a hacky workaround as provided in the linked question.

This is a major development effort that could easily require several weeks or more of effort by a highly skilled developer. It's unlikely anybody will do this just because of a Stack Overflow question. In lieu of a functioning implementation, I'll describe here how one could implement what you're asking for and why it's rather challenging.

The players

Let's start with the key players we'll be dealing with. At the highest level sits the ggplot2 library. It takes data frames and turns them into figures. ggplot2 itself doesn't know anything about low-level drawing, though. It only deals with lines, polygons, text, etc., which it hands off to the grid library in the form of graphics objects (grobs).

The grid library itself is a fairly high-level library. It also doesn't know much about low-level drawing. It primarily deals with lines, polygons, text, etc., which it hands off to an R graphics device. The device does the actual drawing.

There are many different R graphics devices. Enter ?Devices in an R command line to see an incomplete list. There are vector-graphics devices, such as pdf, postscript, or svg, raster devices such as png, jpeg, or tiff, and interactive devices such as X11 or quartz. Obviously, rasterization as a concept only makes sense for vector-graphics devices, since raster devices raster everything anyways. Importantly, neither ggplot2 nor grid know or care which graphics device you're currently drawing on. They deal with graphical objects that can be drawn on any device.

Ideal high-level interface

The high-level interface should consist of an option rasterize in the layer() function of ggplot2. In this way, one could simply write, e.g., geom_point(rasterize = TRUE) to rasterize the points layer. This would work transparently for all geoms and stats, since they all call layer().

Possible implementations

I see four possible routes of implementation, ordered from most impossible to least.

1. Ideally, the layer() function would simply hand off the rasterize option to the grid library, which would hand it off to the graphics device to tell it which parts of the plot to rasterize. This approach would require major changes in the graphics device API. I don't see this happening. Not in my lifetime, at least.

2. Alternatively, one could write a new grob type that can take any arbitrary grob and rasterize it on demand when the grob is drawn on a graphics device. This approach would not require changes in the graphics device API, but it would require detailed knowledge of the low-level implementation of the grid library. It would also possibly make interactive viewing of such figures very slow.

3. A slightly simpler alternative to 2. would be to rasterize the arbitrary grob only once, on grob construction, and then reuse whenever that grob is drawn. This would be quite a bit faster on interactive graphics devices but the drawing would get distorted if the aspect ratio is changed interactively. Nevertheless, since the primary use of this functionality would be to generate pdf output (I assume), this option might be sufficient.

4. Finally, rasterization could also happen in the layer() function, and that function could simply place a regular raster grob into the grob tree. That solution is similar to the technique described here. Technically, it's not much different from 3. Either way, one needs to write code to rasterize a grob tree and then replace it by a raster grob.

Technical hurdles

To rasterize parts of the grob tree, we'd have to send them to an R raster graphics device to render. However, there isn't one that renders to memory. So, one would have to render to a temporary file (e.g., using png()), and then read the file back in. That's possible but ugly. It also depends on functionality (such as png()) that isn't guaranteed to be available on every R installation.

Second, to render parts of the grob tree separately from the overall rendering, we'll have to open a new graphics device in addition to the one currently open. That's possible but can lead to unexpected bugs. I'm dealing with such bugs all the time, see e.g. here or here for issues related to code using this technique. Whoever implements the rasterization functionality would have to deal with such issues.

Finally, we'll have to get the rasterization code accepted into the ggplot2 library, since we need to replace the layer() function and I don't think there's a way to do that from a separate package. Given how hackish the rasterization solutions are going to be (see previous two paragraphs), that may be a tall order.

Claus Wilke
  • 16,992
  • 7
  • 53
  • 104
  • Great analysis! Hopefully it is more of a motivation for s.b. to implement a solution than my question! Do you know where to find the documentation of the `graphics device API`? When you said "Not in my lifetime, at least." did you actually mean: "Not in the lifetime of the R-core team"; Or what is the reason for this pessimistic statement? – jan-glx Dec 13 '17 at 14:31
  • Changing the API of the graphics devices is a major undertaking, where all device drivers and all client code need to be changed. That's not going to happen easily. We're already missing a lot of obviously important things, such as gradient fills, pattern fills, image fills, etc. Rasterization is too specific of a problem to be considered high priority. The response would likely be: Rasterize at a higher level and then just hand the rasterized image to the graphics device. And that's not entirely unreasonable. – Claus Wilke Dec 13 '17 at 22:10
  • I think that approach #2 is probably both the most general and the most feasible. I believe it would involve mostly copying the code from `rastergrob` and reimplementing the [`drawDetails` function](https://github.com/wch/r-source/blob/trunk/src/library/grid/R/primitives.R#L1222) so it rasterizes a grob on the fly and then sends the rasterized image to the low-level graphics device. @baptiste likely has a better understanding of this than me and may see some pitfalls I'm not aware of. – Claus Wilke Dec 13 '17 at 22:17
  • (in case it din't show up in your twitter yet) you might want to check out the `ggrastr` package that provides a workaround for some geoms similar to the one by @baptiste but based on `grid::grid.cap`. – jan-glx Aug 13 '18 at 14:51
  • 1
    `ggraster` now works for all geoms with `ggrastr::rasterize`, it seems to be [implementing](https://github.com/VPetukhov/ggrastr/pull/17/files) approach 4. – jan-glx Sep 21 '20 at 19:21