6

I would like to add additional geoms to a ggplot density plot, but without changing the displayed limits of the data and without having to compute the desired limits by custom code. To give an example:

set.seed(12345)
N = 1000
d = data.frame(measured = ifelse(rbernoulli(N, 0.5), rpois(N, 100), rpois(N,1)))
d$fit = dgeom(d$measured, 0.6)
ggplot(d, aes(x = measured)) + geom_density() + geom_line(aes(y = fit), color = "blue")

ggplot(d, aes(x = measured)) + geom_density() + geom_line(aes(y = fit), color = "blue") + coord_cartesian(ylim = c(0,0.025))

In the first plot, the fit curve (which fits the "measured" data quite badly) obscures the shape of the measured data: Actual output I would like to crop the plot to include all data from the first geom, but crop the fit curve, as in the second plot: Desired output

While I can produce the second plot with coord_cartesian, this has two disadvantages:

  1. I have to compute the limits by my own code (which is cumbersome and error-prone)
  2. Computing the limits by my own code is not compatible with faceting. It is not possible (AFAIK) to provide per-facet axis limits with coord_cartesian. I however need to combine the plot with facet_wrap(scales = "free")

The desired output would be achieved, if the second geom was not considered when computing coordinate limits - is that possible without computing the limits in custom R code?

The question R: How do I use coord_cartesian on facet_grid with free-ranging axis is related, but does not have a satisfactory answer.

Martin Modrák
  • 746
  • 8
  • 17
  • 1
    I don't think it can be done in the context of `facet_wrap`. A workaround might be to manually crop the data beforehand so there is no data beyond the desired plotting limits and, therefore, `ggplot2` doesn't try to resize the axes. Cludgy, but I can't think of an alternative when using `facet_wrap`. – Dan Oct 24 '17 at 11:53

2 Answers2

2

One thing you could try is to scale fit and use geom_density(aes(y = ..scaled..)

Scaling fit between 0 and 1:

d$fit_scaled <- (d$fit  - min(d$fit)) / (max(d$fit) - min(d$fit))

Use fit_scaled and ..scaled..:

ggplot(d, aes(x = measured)) + 
  geom_density(aes(y = ..scaled..)) + 
  geom_line(aes(y = fit_scaled), color = "blue")

output_1

This can be combined with facet_wrap():

d$group <- rep(letters[1:2], 500) #fake group

ggplot(d, aes(x = measured)) + 
  geom_density(aes(y = ..scaled..)) + 
  geom_line(aes(y = fit_scaled), color = "blue") + 
  facet_wrap(~ group, scales = "free")

ouput_2

An option that does not scale the data:

You can use the function multiplot() from http://www.cookbook-r.com/Graphs/Multiple_graphs_on_one_page_(ggplot2)/

multiplot <- function(..., plotlist=NULL, file, cols=1, layout=NULL) {
  library(grid)
  plots <- c(list(...), plotlist)

  numPlots = length(plots)

  if (is.null(layout)) {

    layout <- matrix(seq(1, cols * ceiling(numPlots/cols)),
                    ncol = cols, nrow = ceiling(numPlots/cols))
  }

 if (numPlots==1) {
    print(plots[[1]])

  } else {

    grid.newpage()
    pushViewport(viewport(layout = grid.layout(nrow(layout), ncol(layout))))

    for (i in 1:numPlots) {

      matchidx <- as.data.frame(which(layout == i, arr.ind = TRUE))

      print(plots[[i]], vp = viewport(layout.pos.row = matchidx$row,
                                      layout.pos.col = matchidx$col))
    }
  }
}

With this function you can combine the two plots, which makes it easier to read them:

multiplot(
  ggplot(d, aes(x = measured)) + 
    geom_density() +
    facet_wrap(~ group, scales = "free"),
  ggplot(d, aes(x = measured)) +  
    geom_line(aes(y = fit), color = "blue") + 
    facet_wrap(~ group, scales = "free")
)

This will give you:

output_3

And if you want to compare groups next to each other, you can use facet_grid() instead of facet_wrap() with cols = 2 in multiplot():

multiplot(
  ggplot(d, aes(x = measured)) + 
    geom_density() +
    facet_grid(group ~ ., scales = "free"),
  ggplot(d, aes(x = measured)) +  
    geom_line(aes(y = fit), color = "blue") + 
    facet_grid(group ~ ., scales = "free"),
  cols = 2
)

And it looks like this:

output_4

clemens
  • 6,653
  • 2
  • 19
  • 31
  • I like the trick, but I think it obscures the difference between the two curves - while I can see they are different, I will be unable to tell at which regions is fit above/below the actual density.... – Martin Modrák Oct 24 '17 at 13:59
0

You can try to calculate the max y-limit first. Then plot.

d1 <- d %>% 
  mutate(max_dens=round(max(density(measured)$y), 2))

ggplot(d1, aes(x=measured)) + 
   geom_line(aes(y=fit), color = "blue") +
   geom_density() + 
   coord_cartesian(ylim = c(0, unique(d1$max_dens)))
Roman
  • 17,008
  • 3
  • 36
  • 49
  • Sorry, if that was not clear from my question, but this is exactly what I want to avoid as it is cumbersome and not compatible with faceting. – Martin Modrák Oct 24 '17 at 10:36
  • 4
    Than please update your question and include an example with facets. – Roman Oct 24 '17 at 10:40
  • I've tried formulating the question with adding faceting to the example, but I believe it only obscures the point (it makes the code longer and does not let me produce a clear "desired" output). I tried to further clarify. Note that the requirement to be compatible with facets and to avoid computing the limits myself was already present in the first version of the question, so I believe your answer to be irrelevant to the question and thus worth removing. – Martin Modrák Oct 24 '17 at 10:51