3

I have a data set, for which I graphed a regression (using ggplot2's stat_smooth) :

ggplot(data = mydf, aes(x=time, y=pdm)) + geom_point() + stat_smooth(col="red") 

plot1

I'd also like to have the quantiles (if it's simpler, having only the quartiles will do) using the same method. All I manage to get is the following :

ggplot(data = mydf, aes(x=time, y=pdm, z=surface)) + geom_point() + stat_smooth(col="red") + stat_quantile(quantiles = c(0.25,0.75)) 

enter image description here

Unfortunately, I can't put method="loess" in stat_quantile(), which, if I'm not mistaken, would solve my problem.

(In case it's not clear, desired behavior = non linear regressions for the quantiles, and therefore the regression for Q25 and Q75 being below and above (respectively) my red curve (and Q50, if plotted, would be my red curve)).

Thanks

François M.
  • 4,027
  • 11
  • 30
  • 81

1 Answers1

3

stat_quantile is, by default, plotting best-fit lines for the 25th and 75th percentiles at each x-value. stat_quantile uses the rq function from the quantreg package (implicitly, method="rq" in the stat_quantile call). As far as I know, rq doesn't do loess regression. However, you can use other flexible functions for the quantile regression. Here are two examples:

B-Spline:

library(splines)

stat_quantile(formula=y ~ bs(x, df=4), quantiles = c(0.25,0.75))

Second-Order Polynomial:

stat_quantile(formula=y ~ poly(x, 2), quantiles = c(0.25,0.75))

stat_quantile is still using rq, but rq accepts formulas of the type listed above (if you don't supply a formula, then stat_quantile is implicitly using formula=y~x). If you use the same formula in geom_smooth as for stat_quantile, you'll have consistent regression methods being used for the quantiles and for the mean expectation.

eipi10
  • 91,525
  • 24
  • 209
  • 285
  • Thank you for your input. Isn't it possible to put `loess()` from `stats` in the `formula` arg ? – François M. Mar 31 '16 at 10:26
  • (I'm trying to use `loess`, with this : `stat_quantile(formula = loess(mydf$pdm ~ mydf$time), quantiles = c(0.25, 0.5, 0.75))`, and I get this error : `Warning messages:` `1: 'newdata' had 100 rows but variables found have 6816 rows` `2: Computation failed in 'stat_quantile()':` `replacement has 6816 rows, data has 100 `, I have no idea what it's trying to tell me. – François M. Mar 31 '16 at 10:37
  • I wasn't able to get `loess` to work in `stat_quantile`. 'stat_quantile` uses the `rq` function from the `quantreg` package. `rq` handles formulas with polynomials and splines (as in my answer), but doesn't do `loess` regression (as far as I can tell). A note on syntax: *If* the `loess` function could do quantile regression, the call would be something like this: `stat_quantile(method="loess", quantiles=c(0.25,0.75))` (which also implicitly uses `formula=y~x`). – eipi10 Apr 01 '16 at 15:03
  • Yes, I tried it too, to no avail. Thank you for trying though. Even though I'm not sure it is exactly what I wanted, I ended up doing what's in here https://stat.ethz.ch/pipermail/r-help/2011-October/291575.html, which is putting the quantiles in the DF, and plotting `stat_smooth()` on each one : `ggplot(data = qdfm, aes(x=time, y=pdm)) + geom_point() + geom_smooth(col="red") + stat_smooth(data = qdfm, aes(x = time, y = Q25)) + stat_smooth(data = qdfm, aes(x = time, y = Q75))` – François M. Apr 07 '16 at 09:02