7

Let me explain in pictures what I mean:

set.seed(1)  ## dummy data.frame:
df <- data.frame( value1 = sample(5:15, 20, replace = T), value2 = sample(5:15, 20, replace = T),
                  var1 = c(rep('type1',10), rep('type2',10)), var2 = c('a','b','c','d'))

## Plot 1 

ggplot() +
  geom_point(data = df, aes(value1, value2)) +
  facet_grid(~var1) +
  coord_fixed()

ggsave("plot_2facet.pdf", height=5, units = 'in')
    #Saving 10.3 x 5 in image

## Plot 2  which I want to save in a separate file (!)

ggplot() +
  geom_point(data = df, aes(value1, value2)) +
  facet_grid(~var2) +
  coord_fixed()

ggsave("plot_4facet.pdf", height=5, units = 'in')
    #Saving 10.3 x 5 in image

enter image description here

Now what happens here, that the devices have the same height, but the plots have different heights. But I would like to get the same height for the plots.

In the code above, I tried to only specify the height, but ggsave then just takes a fixed width dimension for the device.

I tried theme(plot.margin = margin(t=1,b=1)), but this did not change anything.

Taking out coord_fixed() gives plots with the same height:

enter image description here

But I would like to use coord_fixed().

Is there a solution for this, or do I need to "guess" the width dimensions of the device to get the correct plot height?

Cheers


Edit

The plots should ideally be created in separate devices/ files.

tjebo
  • 21,977
  • 7
  • 58
  • 94
  • Also, the answer should not only refer to those example plots with this specific numbers of facets. It should be generally applicable to any number of facets in the plot. Is this possible with ggplot at all? I am looking forward to your answers – tjebo Feb 05 '18 at 14:37

2 Answers2

6

This is somewhat tricky with ggplot, so please forgive the long, convoluted, and admittedly a bit hacky answer. The basic problem is that with coord_fixed, the height of the y-axis becomes inextricably linked to the length of the x-axis.

There are two ways we can break this dependency:

  1. by using the expand argument of scale_y_continuous. This allows us to extend the y axis by a given amount beyond the range of the data. The tricky bit is knowing how much to expand it, because this depends in a hard-to-predict way on all elements of the plot, including how many facets there are and the size of axis titles and labels etc.

  2. by allowing the width of the two plots to differ. The tricky thing here is, as above, how to find the correct width as this depends on the various other aspects of the plots.

First I show how we can solve the first version (how much to expand the y-axis). Then using a similar approach and a little extra trickery we can also solve the varying width version.

Solution to finding how much to expand the y-axis

Given the difficulties of predicting how large the plotting area will be (which depnds on the relative sizes of all the elements of the plot), what we can do is to save a dummy plot in which we shade the plot area in black, read the image file back in, then measure the size of the black area to determine how large the plot area is:

1) let's start by assigning your plots to variables

p1 = ggplot(df1) +
  geom_point(aes(value1, value2)) +
  facet_grid(~var1) +
  coord_fixed()

p2 = ggplot(df1) +
  geom_point(aes(value1, value2)) +
  facet_grid(~var2) +
  coord_fixed() 

2) now we can save some dummy versions of these plots that only show a black rectangle where the plotting region is:

t_blank = theme(strip.background = element_rect(fill = NA),
      strip.text = element_text(color=NA),
      axis.title = element_text(color = NA),
      axis.text = element_text(color = NA),
      axis.ticks = element_line(color = NA))

p1 + geom_rect(aes(xmin=-Inf, xmax=Inf, ymin=-Inf, ymax=Inf), fill='black') + 
     t_blank
ggsave(fn1 <- tempfile(fileext = '.png'), height=5, units = 'in')

p2 + geom_rect(aes(xmin=-Inf, xmax=Inf, ymin=-Inf, ymax=Inf), fill='black') + 
     t_blank
ggsave(fn2 <- tempfile(fileext = '.png'), height=5, units = 'in')

3) then we read these into an array (just the first color band is enough)

library(png)
p1.saved = readPNG(fn1)[,,1]
p2.saved = readPNG(fn2)[,,1]

4) calculate the height of each plotting area (the black-shaded areas which have a value=zero)

p1.height = diff(row(p1.saved)[range(which(p1.saved==0))])
p2.height = diff(row(p2.saved)[range(which(p2.saved==0))])

5) Find how much we need to expand the plotting area based on these. Note that we subtract the ratio of heights from 1.1 to account for the fact that the original plots were already expanded by the default amount of 0.05 in each direction. Disclaimer -- this formula works on your example. I haven't had time to check it more broadly, and it may yet need adapting to ensure generality for other plots

height.expand = 1.1 - p2.height / p1.height

6) Now we can save the plots using this expansion factor

ggsave("plot_2facet.pdf", p1, height=5, units = 'in')
ggsave("plot_4facet.pdf", p2 + scale_y_continuous(expand=c(height.expand, 0)), 
        height=5, units = 'in')

Solution to finding how much to alter the width

first, lets set the width of the first plot to what we want

p1.width = 10

Now, using the same approach as in the previous section we find how tall the plotting area is in this plot.

p1 + geom_rect(aes(xmin=-Inf, xmax=Inf, ymin=-Inf, ymax=Inf), fill='black') + 
     t_blank
ggsave(fn1 <- tempfile(fileext = '.png'), height=5, width = p1.width, units = 'in')
p1.saved = readPNG(fn1, info = T)[,,1]
p1.height = diff(row(p1.saved)[range(which(p1.saved==0))])

Next, we find the mimimum width the second plot must have to get the same height (note - we look for a minimum here because any greater width than this will not increase the height, which already fiulls the vertical space, but will simply add white space to the left and right)

We will solve for the width using the function uniroot which finds where a function crosses zero. To use uniroot we first define a function that will calculate the height of a plot given its width as an argument. It then returns the difference between that height and the height we want. The line if (x==0) x = -1e-8 in this function is a dirty trick to allow uniroot solve a function that reaches zero, but does not cross it - see here.

fn2 <- tempfile(fileext = '.png')
find.p2 = function(w){
  p = p2 + geom_rect(aes(xmin=-Inf, xmax=Inf, ymin=-Inf, ymax=Inf), fill='black') + 
           t_blank
  ggsave(fn2, p, height=5, width = w, units = 'in')
  p2.saved = readPNG(fn2, info = T)[,,1]
  p2.height = diff(row(p2.saved)[range(which(p2.saved==0))])
  x = abs(p1.height - p2.height)
  if (x==0) x = -1e-8
  x
}

N1 = length(unique(df$var1)) 
N2 = length(unique(df$var2)) 
p2.width = uniroot(find.p2, c(p1.width, p1.width*N2/N1))

Now we are ready to save the plots with the correct widths to ensure they have the same height.

p1
ggsave("plot_2facet.pdf", height=5, width = p1.width, units = 'in')
p2
ggsave("plot_4facet.pdf", height=5, width = p2.width$root, units = 'in')

dww
  • 30,425
  • 5
  • 68
  • 111
  • yes! Brilliant. Solution 2 is what I am looking for. You are a magician. I need to check how I will manage to adapt this to different facet numbers...Solution 1 however, is not really what I was looking for - did I miss something essential?? I will need to have a really close look into your function too... It's already a massive upvote, but let me check the generalisability first, before I accept. Cheers! – tjebo Feb 09 '18 at 16:49
  • to generalise for different numbers of facets, you may need to alter the top end of the range in `uniroot`. If N1 and N2 are the number of facets in p1 and p2, respecitvely, then a good choice of upper end of the range would be `p1.width*N2/N1`. – dww Feb 09 '18 at 18:12
  • sorry for the delay in checking. I have tested this now with some of my real plots and it keeps throwing an error at the very first step of creating the black dummy plot: `Error in if (empty(data)) { : missing value where TRUE/FALSE needed`. Weirdly, this error disappears when I remove faceting (!) ... – tjebo Feb 11 '18 at 19:53
  • 1
    Can you post a reproducible example that throws this error? It could be related to [this](https://github.com/tidyverse/ggplot2/issues/2417)? Or maybe [this](https://github.com/tidyverse/ggplot2/issues/756). If this is a simple to solve issue, I can try to look at it here. But it's possible that this may need to be posted as a separate question – dww Feb 11 '18 at 20:45
  • 1
    BTW if I type `if (empty(df)) print('hello')`, where `df` is a function in base R rather than a data.frame, it generates the same error. Note that I used the object name `df1` in my answer to avoid possible conflicts between the data.frame and the inbuilt function df. In general, it is considered bad practice to use names of functions as variable names, and best avoided. I wonder if what you are getting is related to your object names? – dww Feb 11 '18 at 21:03
  • allow me one minor minor question for clarification of your answer: I still do not understand why solution 1 should be considered a solution - the plots that you get look basically the same to me to those one gets when simply not calling `coord_ratio()` ? – tjebo Feb 11 '18 at 21:24
  • in version 1, `coord_fixed` still forces the x and y scales to have data-units per unit length to be in a fixed ratio, as it should. This is achieved by allowing the data points themselves to become compressed relative to the y-axis. I agree that this is probably not what you want yourself, but it is worth leaving in the answer as it could be useful to someone else who requires the width of the two plots to remain the same (then this becomes the only way to also keep the heights the same) – dww Feb 11 '18 at 21:35
0

You can do this (it turns out) using the awesome egg package. I don't actually know how this works, or if it works more generally than this case; I just took a punt on the basis that ggarrange figures out the alignment. If anyone could shed light on this, that'd be great!

library(egg)
getScale <- ggarrange(p1, p2, draw = F, ncol=2)
p1_sc <- ggarrange(p1, heights = getScale$heights[2])
ggsave("plot_2facet.pdf", plot=p1_sc, height=5, units = 'in')

p2_sc <- ggarrange(p2, heights = getScale$heights[2])
ggsave("plot_4facet.pdf", plot=p2_sc, height=5, units = 'in')

Yeah, I really have no idea how this works:

getScale$heights[2]
# [1] max(1*1null, 1*1null)
class(getScale$heights[2])
# [1] "unit.list" "unit"   

EDIT ..it does seem to generalise though

p3 <- ggplot() +
    geom_point(data = df, aes(value1, value2)) +
    facet_wrap(~var2, nrow=2) +
    coord_fixed()

getScale <- ggarrange(p1, p2, p3, draw = F, ncol=3)

p1_sc <- ggarrange(p1, heights = getScale$heights[2])
ggsave("plot_2facet.pdf", plot=p1_sc, height=5, units = 'in')

p2_sc <- ggarrange(p2, heights = getScale$heights[2])
ggsave("plot_4facet.pdf", plot=p2_sc, height=5, units = 'in')

p3_sc <- ggarrange(p3, heights = getScale$heights[2])
ggsave("plot_4facet_2row.pdf", plot=p3_sc, height=5, units = 'in')
Simon Mills
  • 188
  • 9