5

I'm a bit stuck plotting a raster with a log scale. Consider this plot for example:

ggplot(faithfuld, aes(waiting, eruptions)) +
 geom_raster(aes(fill = density))

enter image description here

But how to use a log scale with this geom? None of the usual methods are very satisfying:

 ggplot(faithfuld, aes(waiting, log10(eruptions))) +
   geom_raster(aes(fill = density))

enter image description here

 ggplot(faithfuld, aes(waiting, (eruptions))) +
   geom_raster(aes(fill = density)) + 
   scale_y_log10()

enter image description here

and this doesn't work at all:

 ggplot(faithfuld, aes(waiting, (eruptions))) +
   geom_raster(aes(fill = density)) + 
   coord_trans(x="log10")

Error: geom_raster only works with Cartesian coordinates

Are there any options for using a log scale with a raster?

To be precise, I have three columns of data. The z value is the one I want to use to colour the raster, and it is not computed from the x and y values. So I need to supply all three columns to the ggplot function. For example:

dat <- data.frame(x = rep(1:10, 10), 
                  y = unlist(lapply(1:10, function(i) rep(i, 10))), 
                  z = faithfuld$density[1:100])

ggplot(dat, aes(x = log(x), y = y, fill = z)) +
  geom_raster()

enter image description here

What can I do to get rid of those gaps in the raster?

Note that this question is related to these two:

I have been keeping an updated gist of R code that combines details from the answers to these questions (example output included in the gist). That gist is here: https://gist.github.com/benmarwick/9a54cbd325149a8ff405

Community
  • 1
  • 1
Ben
  • 41,615
  • 18
  • 132
  • 227

1 Answers1

6

The dataset faithfuld already have a column for density which is the estimates of the 2D density for waiting and eruptions. You can find that the eruptions and waiting in the dataset are points in a grid. When you use geom_raster, it doesn't compute the density for you. Instead, it plots the density according to the x, y coordinates, in this case, is the grid. Hence, if you just apply the log transformation on y, it will distort the difference between y (originally they are equally spaced) and this is why you see the space in your plot. I used points to visualize the effects:

library(ggplot2)
library(gridExtra)

# Use point to visualize the effect of log on the dataset
g1 <- ggplot(faithfuld, aes(x=waiting, y=eruptions)) +
  geom_point(size=0.5)    

g2 <- ggplot(faithfuld, aes(x=waiting, y=log(eruptions))) +
  geom_point(size=0.5)    

grid.arrange(g1, g2, ncol=2)    

enter image description here

If you really want to transform y to log scale and produce the density plot, you have to use the faithful dataset with geom_density_2d.

# Use geom_density_2d
ggplot(faithful, aes(x=waiting, y=log(eruptions))) +
  geom_density_2d() +
  stat_density_2d(geom="raster", aes(fill=..density..),
                  contour=FALSE)

enter image description here

Update: Use geom_rect and supply custom xmin, xmax, ymin, ymax values to fit the spaces of the log scale.

Since the geom_raster use the same size of tiles, you probably have to use geom_tile or geom_rect to create the plot. My idea is to calculate how large (width) each tile should be and adjust the xmin and xmax for each tile to fill up the gap.

 dat <- data.frame(x = rep(1:10, 10), 
                  y = unlist(lapply(1:10, function(i) rep(i, 10))), 
                  z = faithfuld$density[1:100])
library(ggplot2)
library(gridExtra)   

g <- ggplot(dat, aes(x = log(x), y = y, fill = z)) +
  geom_raster()   

# Replace the ymin and ymax
distance <- diff((unique(dat$x)))/2
upper <- (unique(dat$x)) + c(distance, distance[length(distance)])
lower <- (unique(dat$x)) - c(distance[1], distance) 

# Create xmin, xmax, ymin, ymax
dat$xmin <- dat$x - 0.5 # default of geom_raster is 0.5
dat$xmax <- dat$x + 0.5
dat$ymin <- unlist(lapply(lower, function(i) rep(i, rle(dat$y)$lengths[1])))
dat$ymax <- unlist(lapply(upper, function(i) rep(i, rle(dat$y)$lengths[1])))        

# You can also use geom_tile with the width argument
g2 <- ggplot(dat, aes(x=log(x), y=y, xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax, fill=z)) +
  geom_rect() 

# show the plots     
grid.arrange(g, g2, ncol=2)

enter image description here

Ben
  • 41,615
  • 18
  • 132
  • 227
JasonWang
  • 2,414
  • 11
  • 12
  • Thanks, In my actual use-case I really need to use a third variable as the density, as in the `faithfuld` dataset, so your suggestion doesn't solve my problem. Do you have any other ideas? – Ben Mar 09 '16 at 02:30
  • I am not sure what do you mean by "really need to use the third variable as the density.". Do you have a way to calculate the density? Another approach is to retrieve the data which `ggplot` use to construct the plot. You can save the plot as an object like `g` and retrieve the data by `ggplot_build(g)$data`. The data contains the density. – JasonWang Mar 09 '16 at 03:09
  • Or can you edit the problem and provide a subset of your actual use-case? – JasonWang Mar 09 '16 at 03:10
  • I mean that I have three columns in my dataset, and the z column is not computed from the x or y columns. I have another way to compute the z value, independent of the plotting function. I want to use the z column as the colour values for the raster. – Ben Mar 09 '16 at 03:46
  • 1
    Thanks for your update, that solves my problem. I've edited your answer to make your solution more general (and suitable for my actual use-case). – Ben Mar 10 '16 at 14:34
  • Your last plot is an exceptionally useful case for creating surface plots for sparsely populated data. In order to cater for x data that is not uniformly populated I changed your code for my use case so that there is distance, upper and lower variables for both X and Y and then updated the xmin and xmin formulas to look the same as ymin and ymax. Thus creating a completely dynamic method for any data set regardless of the spacing between x and y variables – Jaco-Ben Vosloo Apr 04 '18 at 11:12