ggplot2: fit logarithmic stat_smooth to geom_ribbon

Question

I'm trying to plot fitted model effects in ggplot2 as an alternative to the plots returned by the effects package, and I'm running into issues with using stat_smooth to fit log-transformed confidence bands via geom_ribbon. Unlike typical uses of geom_ribbon, I don't need to calculate the bands--the eff object gives me the limits of the bands--I just need to log-transform them. There's plenty out there on how to do this for geom_line (e.g., R, ggplot2: Fit curve to scatter plot) but so far I haven't found anything for geom_ribbon.

The data:

myEffs <- structure(list(TargetVowelDur = c(0.03, 0.4, 0.8, 1, 2), fit = c(-0.467790933985126, 
0.823476426481035, 1.16901542809292, 1.28025414059112, 1.625793142203
), se = c(0.087385175843338, 0.0895697786138634, 0.0922444075008412, 
0.0932736493340376, 0.0969532573361368), lower = c(-0.639066303684154, 
0.647919224725754, 0.98821594070963, 1.09743733420847, 1.43576428623844
), upper = c(-0.296515564286098, 0.999033628236315, 1.34981491547621, 
1.46307094697376, 1.81582199816757)), class = "data.frame", row.names = c(NA, 
-5L), transformation = function (eta) 
eta, .Names = c("TargetVowelDur", "fit", "se", "lower", "upper"
))

Passing geom_line as-is yields 4 connected line segments, not a logarithmic curve, so the standard solution is to add stat_smooth:

library(ggplot2)
p1 <- ggplot(myEffs, aes(x=TargetVowelDur, y=fit)) +
  geom_line(stat="smooth", method="lm", formula=y~log(x))
p1

All good. By that same logic, we should be able to add stat_smooth to geom_ribbon, but doing so leaves the plot unchanged

p2 <- p1 + 
  geom_ribbon(aes(ymin=lower, ymax=upper), stat="smooth", method="lm", formula=y~log(x))
p2

If we pry into the build of p2, we find that the ymin and ymax for geom_ribbon are identical, despite the fact that the upper and lower columns are non-identical:

> print(lapply(ggplot_build(p2)$data, head))
[[1]]
           x           y        ymin        ymax           se PANEL group colour size linetype alpha
1 0.03000000 -0.46779093 -0.46779093 -0.46779093 2.568169e-15     1    -1  black  0.5        1    NA
2 0.05493671 -0.16620173 -0.16620173 -0.16620173 2.136541e-15     1    -1  black  0.5        1    NA
3 0.07987342  0.02037031  0.02037031  0.02037031 1.887702e-15     1    -1  black  0.5        1    NA
4 0.10481013  0.15581841  0.15581841  0.15581841 1.720023e-15     1    -1  black  0.5        1    NA
5 0.12974684  0.26221720  0.26221720  0.26221720 1.598524e-15     1    -1  black  0.5        1    NA
6 0.15468354  0.34985293  0.34985293  0.34985293 1.506906e-15     1    -1  black  0.5        1    NA

[[2]]
           x           y        ymin        ymax           se PANEL group colour   fill size linetype alpha
1 0.03000000 -0.46779093 -0.46779093 -0.46779093 2.568169e-15     1    -1     NA grey20  0.5        1    NA
2 0.05493671 -0.16620173 -0.16620173 -0.16620173 2.136541e-15     1    -1     NA grey20  0.5        1    NA
3 0.07987342  0.02037031  0.02037031  0.02037031 1.887702e-15     1    -1     NA grey20  0.5        1    NA
4 0.10481013  0.15581841  0.15581841  0.15581841 1.720023e-15     1    -1     NA grey20  0.5        1    NA
5 0.12974684  0.26221720  0.26221720  0.26221720 1.598524e-15     1    -1     NA grey20  0.5        1    NA
6 0.15468354  0.34985293  0.34985293  0.34985293 1.506906e-15     1    -1     NA grey20  0.5        1    NA

> myEffs$upper - myEffs$lower
[1] 0.3425507 0.3511144 0.3615990 0.3656336 0.3800577

How do I get stat_smooth and geom_ribbon to play nice together?

Why not sample more points between c(0.03, 2) and obtain fitted values and CI? The value obtained with `smooth` might be different from what you would obtain from your model. — mt1022, Aug 10 '18 at 07:19
I agree with @mt1022 that the solution is to get predicted values at a much finer resolution of your `x` variable. For example, you could make your `x` for predictions as `seq(.03, 2, by = .1)`. Then once you get the predicted values of `y` and the CI, you'll be able to use `geom_line()` and `geom_ribbon()` directly. If things are still not smooth enough, use a smaller `by` in `seq()`. — aosmith, Aug 10 '18 at 14:57
So in other words, use `geom_line()` and `geom_ribbon()` on the predicted points themselves rather than a smooth. I like it! @mt1022, do you mind writing this up as an answer? I'll edit it to provide the data and accept it as the best answer. — Dan Villarreal, Aug 12 '18 at 23:34

Wimpel · Answer 1 · 2018-08-10T07:34:51.763

My solution is to plot three lines (data, upper and lower), and than use the data of the 'upper' and 'lower' lines to create a grey area; the ribbon.

library(ggplot2)
g1 <- ggplot(myEffs) + 
  geom_line(aes(x = TargetVowelDur, y = fit), stat = "smooth", method = "lm", formula=y~log(x)) + 
  geom_line(aes(x = TargetVowelDur, y = upper), color = "red", stat = "smooth", method = "lm", formula=y~log(x)) + 
  geom_line(aes(x = TargetVowelDur, y = lower), color = "blue", stat = "smooth", method = "lm", formula=y~log(x))

g1

# build plot object for rendering 
gg1 <- ggplot_build(g1)

# extract data from the upper and lower lines
df2 <- data.frame(x = gg1$data[[1]]$x,
                  ymin = gg1$data[[2]]$y,
                  ymax = gg1$data[[3]]$y) 

# use the lm data to add the ribbon to the plot 
g1 +  geom_ribbon(data = df2, aes(x = x, ymin = ymin, ymax = ymax), fill = "grey", alpha = 0.4)

based in @Henrik 's answer in this post

ggplot2: fit logarithmic stat_smooth to geom_ribbon

1 Answers1