67

Is there a way to extract the values of the fitted line returned from stat_smooth?

The code I am using looks like this:

p <- ggplot(df1, aes(x=Days, y= Qty,group=Category,color=Category))
p <- p + stat_smooth(method=glm, fullrange=TRUE)+ geom_point())

This new r user would greatly appreciate any guidance.

Ken Williams
  • 22,756
  • 10
  • 85
  • 147
MikeTP
  • 7,716
  • 16
  • 44
  • 57
  • 2
    You example code isn't reproducible, can you *make it so*? – James Mar 20 '12 at 15:46
  • 1
    Yes I know, and I will make it so as soon as my R stops saying "Not Responding". I just thought someone might know of the top of their head w/out reproducible code. – MikeTP Mar 20 '12 at 15:51
  • 2
    I believe the answer is no, as AFAIK the smoothers aren't fit+evaluated until the plot is rendered. In general, if you want to manipulate fitted values, etc., you fit the model outside of ggplot and pass the data in to the specific layer you want it in. – joran Mar 20 '12 at 15:53
  • 2
    @joran Actually you can, though its a bit hacky to get the fitted values out of the `ggplot` environment. – James Mar 20 '12 at 16:25

5 Answers5

74

Riffing off of @James example

p <- qplot(hp,wt,data=mtcars) + stat_smooth()

You can use the intermediate stages of the ggplot building process to pull out the plotted data. The results of ggplot_build is a list, one component of which is data which is a list of dataframes which contain the computed values to be plotted. In this case, the list is two dataframes since the original qplot creates one for points and the stat_smooth creates a smoothed one.

> ggplot_build(p)$data[[2]]
geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.
           x        y     ymin     ymax        se PANEL group
1   52.00000 1.993594 1.149150 2.838038 0.4111133     1     1
2   55.58228 2.039986 1.303264 2.776709 0.3586695     1     1
3   59.16456 2.087067 1.443076 2.731058 0.3135236     1     1
4   62.74684 2.134889 1.567662 2.702115 0.2761514     1     1
5   66.32911 2.183533 1.677017 2.690049 0.2465948     1     1
6   69.91139 2.232867 1.771739 2.693995 0.2244980     1     1
7   73.49367 2.282897 1.853241 2.712552 0.2091756     1     1
8   77.07595 2.333626 1.923599 2.743652 0.1996193     1     1
9   80.65823 2.385059 1.985378 2.784740 0.1945828     1     1
10  84.24051 2.437200 2.041282 2.833117 0.1927505     1     1
11  87.82278 2.490053 2.093808 2.886297 0.1929096     1     1
12  91.40506 2.543622 2.145018 2.942225 0.1940582     1     1
13  94.98734 2.597911 2.196466 2.999355 0.1954412     1     1
14  98.56962 2.652852 2.249260 3.056444 0.1964867     1     1
15 102.15190 2.708104 2.303465 3.112744 0.1969967     1     1
16 105.73418 2.764156 2.357927 3.170385 0.1977705     1     1
17 109.31646 2.821771 2.414230 3.229311 0.1984091     1     1
18 112.89873 2.888224 2.478136 3.298312 0.1996493     1     1
19 116.48101 2.968745 2.531045 3.406444 0.2130917     1     1
20 120.06329 3.049545 2.552102 3.546987 0.2421773     1     1
21 123.64557 3.115893 2.573577 3.658208 0.2640235     1     1
22 127.22785 3.156368 2.601664 3.711072 0.2700548     1     1
23 130.81013 3.175495 2.625951 3.725039 0.2675429     1     1
24 134.39241 3.181411 2.645191 3.717631 0.2610560     1     1
25 137.97468 3.182252 2.658993 3.705511 0.2547460     1     1
26 141.55696 3.186155 2.670350 3.701961 0.2511175     1     1
27 145.13924 3.201258 2.687208 3.715308 0.2502626     1     1
28 148.72152 3.235698 2.721744 3.749652 0.2502159     1     1
29 152.30380 3.291766 2.782767 3.800765 0.2478037     1     1
30 155.88608 3.353259 2.857911 3.848607 0.2411575     1     1
31 159.46835 3.418409 2.938257 3.898561 0.2337596     1     1
32 163.05063 3.487074 3.017321 3.956828 0.2286972     1     1
33 166.63291 3.559111 3.092367 4.025855 0.2272319     1     1
34 170.21519 3.634377 3.165426 4.103328 0.2283065     1     1
35 173.79747 3.712729 3.242093 4.183364 0.2291263     1     1
36 177.37975 3.813399 3.347232 4.279565 0.2269509     1     1
37 180.96203 3.910849 3.447572 4.374127 0.2255441     1     1
38 184.54430 3.977051 3.517784 4.436318 0.2235917     1     1
39 188.12658 4.037302 3.583959 4.490645 0.2207076     1     1
40 191.70886 4.091635 3.645111 4.538160 0.2173882     1     1
41 195.29114 4.140082 3.700184 4.579981 0.2141624     1     1
42 198.87342 4.182676 3.748159 4.617192 0.2115424     1     1
43 202.45570 4.219447 3.788162 4.650732 0.2099688     1     1
44 206.03797 4.250429 3.819579 4.681280 0.2097573     1     1
45 209.62025 4.275654 3.842137 4.709171 0.2110556     1     1
46 213.20253 4.295154 3.855951 4.734357 0.2138238     1     1
47 216.78481 4.308961 3.861497 4.756425 0.2178456     1     1
48 220.36709 4.317108 3.859541 4.774675 0.2227644     1     1
49 223.94937 4.319626 3.851025 4.788227 0.2281358     1     1
50 227.53165 4.316548 3.836964 4.796132 0.2334829     1     1
51 231.11392 4.308435 3.818728 4.798143 0.2384117     1     1
52 234.69620 4.302276 3.802201 4.802351 0.2434590     1     1
53 238.27848 4.297902 3.787395 4.808409 0.2485379     1     1
54 241.86076 4.292303 3.772103 4.812503 0.2532567     1     1
55 245.44304 4.282505 3.754087 4.810923 0.2572576     1     1
56 249.02532 4.269040 3.733184 4.804896 0.2608786     1     1
57 252.60759 4.253361 3.710042 4.796680 0.2645121     1     1
58 256.18987 4.235474 3.684476 4.786473 0.2682509     1     1
59 259.77215 4.215385 3.656265 4.774504 0.2722044     1     1
60 263.35443 4.193098 3.625161 4.761036 0.2764974     1     1
61 266.93671 4.168621 3.590884 4.746357 0.2812681     1     1
62 270.51899 4.141957 3.553134 4.730781 0.2866658     1     1
63 274.10127 4.113114 3.511593 4.714635 0.2928472     1     1
64 277.68354 4.082096 3.465939 4.698253 0.2999729     1     1
65 281.26582 4.048910 3.415849 4.681971 0.3082025     1     1
66 284.84810 4.013560 3.361010 4.666109 0.3176905     1     1
67 288.43038 3.976052 3.301132 4.650972 0.3285813     1     1
68 292.01266 3.936392 3.235952 4.636833 0.3410058     1     1
69 295.59494 3.894586 3.165240 4.623932 0.3550782     1     1
70 299.17722 3.850639 3.088806 4.612473 0.3708948     1     1
71 302.75949 3.804557 3.006494 4.602619 0.3885326     1     1
72 306.34177 3.756345 2.918191 4.594499 0.4080510     1     1
73 309.92405 3.706009 2.823813 4.588205 0.4294926     1     1
74 313.50633 3.653554 2.723308 4.583801 0.4528856     1     1
75 317.08861 3.598987 2.616650 4.581325 0.4782460     1     1
76 320.67089 3.542313 2.503829 4.580796 0.5055805     1     1
77 324.25316 3.483536 2.384853 4.582220 0.5348886     1     1
78 327.83544 3.422664 2.259739 4.585589 0.5661643     1     1
79 331.41772 3.359701 2.128512 4.590891 0.5993985     1     1
80 335.00000 3.294654 1.991200 4.598107 0.6345798     1     1

Knowing a priori where the one you want is in the list isn't easy, but if nothing else you can look at the column names.

It is still better to do the smoothing outside the ggplot call, though.

EDIT:

It turns out replicating what ggplot2 does to make the loess is not as straightforward as I thought, but this will work. I copied it out of some internal functions in ggplot2.

model <- loess(wt ~ hp, data=mtcars)
xrange <- range(mtcars$hp)
xseq <- seq(from=xrange[1], to=xrange[2], length=80)
pred <- predict(model, newdata = data.frame(hp = xseq), se=TRUE)
y = pred$fit
ci <- pred$se.fit * qt(0.95 / 2 + .5, pred$df)
ymin = y - ci
ymax = y + ci
loess.DF <- data.frame(x = xseq, y, ymin, ymax, se = pred$se.fit)

ggplot(mtcars, aes(x=hp, y=wt)) +
  geom_point() +
  geom_smooth(aes_auto(loess.DF), data=loess.DF, stat="identity")

That gives a plot that looks identical to

ggplot(mtcars, aes(x=hp, y=wt)) +
  geom_point() +
  geom_smooth()

(which is the expanded form of the original p).

Brian Diggs
  • 57,757
  • 13
  • 166
  • 188
  • Nice, it's a lot more useful to get the x values too. – James Mar 20 '12 at 17:30
  • Thanks. Rookie curious, why do you say it is better to do smoothing outside the ggplot call? Seems like if you need to produce the plots anyway, it would be repetitive to do a sepreate fit just to extract the fit values. – MikeTP Mar 20 '12 at 19:03
  • 1
    I wouldn't do the fit outside the ggplot call in addition to one inside of it, but rather instead of. I would form the fit data into an appropriate format/dataframe and then plot that. There is an example of that buried in this answer: http://stackoverflow.com/a/9742012/892313 – Brian Diggs Mar 20 '12 at 19:19
  • 3
    @MikeTP You need far more than the "fit" to evaluate the fitted model and therefore judge whether the "fit" is adequate or not. – Gavin Simpson Mar 21 '12 at 09:57
  • @BrianDiggs...do you have a example simular to what you referenced above (Mar 20 at 19:19) of how to recreate a loess fit plotthat was created outside of ggplot? – MikeTP Mar 23 '12 at 19:28
  • @MikeTP I added an example of performing a `loess` beforehand to the answer. – Brian Diggs Mar 26 '12 at 17:20
  • thanks..very helpful,I really appreciate it. Last question and I won't bother you for a very long time. If my origional loess tranformed the y variable [i.e. model <- loess(log10(wt) ~ hp, data=mtcars) ] what is the best way to produced the produce the plot in non log10 space? I can get most of it by producing a secondary loess.DF2 data frame with the necessary transformations back to linear space but for some reason I can't get the 95% ci to show up. Regardless, I'm guessing you have a far more efficient way to do it than crating a redundant loess.DF2 data frame. – MikeTP Mar 26 '12 at 18:54
  • 1
    Changing the model line to `model <- loess(log10(wt) ~ hp, data=mtcars)` and the loess.DF line to `loess.DF <- data.frame(x = xseq, y=10^y, ymin=10^ymin, ymax=10^ymax, se = pred$se.fit)`, I get a plot with a curve with a band around it. Basically, I computed `y`, `ymin`, and `ymax` in log space and then transformed those back. – Brian Diggs Mar 26 '12 at 19:24
  • This technique rocks. By increasing the sampling (n-365) I can get smoothing data for everyday of a year long timeseries which makes my life super easy for some analysis. Awesome, thanks. – BarneyC Jun 10 '15 at 09:37
68

stat_smooth does produce output that you can use elsewhere, and with a slightly hacky way, you can put it into a variable in the global environment.

You enclose the output variable in .. on either side to use it. So if you add an aes in the stat_smooth call and use the global assign, <<-, to assign the output to a varible in the global environment you can get the the fitted values, or others - see below.

qplot(hp,wt,data=mtcars) + stat_smooth(aes(outfit=fit<<-..y..))
fit
 [1] 1.993594 2.039986 2.087067 2.134889 2.183533 2.232867 2.282897 2.333626
 [9] 2.385059 2.437200 2.490053 2.543622 2.597911 2.652852 2.708104 2.764156
[17] 2.821771 2.888224 2.968745 3.049545 3.115893 3.156368 3.175495 3.181411
[25] 3.182252 3.186155 3.201258 3.235698 3.291766 3.353259 3.418409 3.487074
[33] 3.559111 3.634377 3.712729 3.813399 3.910849 3.977051 4.037302 4.091635
[41] 4.140082 4.182676 4.219447 4.250429 4.275654 4.295154 4.308961 4.317108
[49] 4.319626 4.316548 4.308435 4.302276 4.297902 4.292303 4.282505 4.269040
[57] 4.253361 4.235474 4.215385 4.193098 4.168621 4.141957 4.113114 4.082096
[65] 4.048910 4.013560 3.976052 3.936392 3.894586 3.850639 3.804557 3.756345
[73] 3.706009 3.653554 3.598987 3.542313 3.483536 3.422664 3.359701 3.294654

The outputs you can obtain are:

  • y, predicted value
  • ymin, lower pointwise confidence interval around the mean
  • ymax, upper pointwise confidence interval around the mean
  • se, standard error

Note that by default it predicts on 80 data points, which may not be aligned with your original data.

James
  • 65,548
  • 14
  • 155
  • 193
  • 18
    You're not kidding when you call that hacky. (You missed the `<<-` in your code.) – joran Mar 20 '12 at 16:31
  • 1
    @joran Thanks, fixed now. Though I'm getting a bit suspicious about the numbers that its producing. – James Mar 20 '12 at 16:36
  • @joran Got it working now, it neeeds to be in the `stat_smooth` call, otherwise it just uses the `y` values of the original data. – James Mar 20 '12 at 16:46
  • @James, Thats great! You mentioned by default it only predicts 80 data points, do you know how to change (increase) this? – MikeTP Mar 20 '12 at 17:00
  • 4
    @MikeTP See `?stat_smooth` and the argument `n`. – joran Mar 20 '12 at 17:15
  • 3
    Thank you. Great anwsers guys I really appreciate it. On thing I will point out is that I have discovered is you need to actually call the plot directly to generate the fit. In my case I am looping through some regions and using the pdf and print to store to pdf file after using annotation_custom to add a small plot within the main plot. Seems the print(p) dosent generate the fit, so I addded a simple p (I think this is called "rendering" the plot) and it generated the fit variable in my workspace. – MikeTP Mar 20 '12 at 19:00
20

A more general approach could be to simply use the predict() function to predict any range of values that are interesting.

# define the model
model <- loess(wt ~ hp, data = mtcars)

# predict fitted values for each observation in the original dataset
modelFit <- data.frame(predict(model, se = TRUE))

# define data frame for ggplot
df <- data.frame(cbind(hp = mtcars$hp
          , wt = mtcars$wt
          , fit = modelFit$fit
          , upperBound = modelFit$fit + 2 * modelFit$se.fit
          , lowerBound = modelFit$fit - 2 * modelFit$se.fit
          ))

# build the plot using the fitted values from the predict() function
# geom_linerange() and the second geom_point() in the code are built using the values from the predict() function
# for comparison ggplot's geom_smooth() is also shown
g <- ggplot(df, aes(hp, wt))
g <- g + geom_point()
g <- g + geom_linerange(aes(ymin = lowerBound, ymax = upperBound))
g <- g + geom_point(aes(hp, fit, size = 1))
g <- g + geom_smooth(method = "loess")
g

# Predict any range of values and include the standard error in the output
predict(model, newdata = 100:300, se = TRUE)
phillyooo
  • 1,523
  • 2
  • 16
  • 22
11

If you want to bring in the power of the tidyverse, you can use the "broom" library to add the predicted values from the loess function to your original dataset. This is building on @phillyooo's solution.

library(tidyverse)
library(broom)

# original graph with smoother
ggplot(data=mtcars, aes(hp,wt)) + 
  stat_smooth(method = "loess", span = 0.75)

# Create model that will do the same thing as under the hood in ggplot2
model <- loess(wt ~ hp, data = mtcars, span = 0.75)

# Add predicted values from model to original dataset using broom library
mtcars2 <- augment(model, mtcars)

# Plot both lines 
ggplot(data=mtcars2, aes(hp,wt)) + 
  geom_line(aes(hp, .fitted), color = "red") +
  stat_smooth(method = "loess", span = 0.75)
hd1
  • 33,938
  • 5
  • 80
  • 91
Nova
  • 5,423
  • 2
  • 42
  • 62
8

Save the graph object and use ggplot_build() or layer_data() to obtain the elements/estimates for the layers. e.g.

pp<-ggplot(mtcars, aes(x=hp, y=wt)) +   geom_point() +   geom_smooth();
ggplot_build(pp)
Nick
  • 138,499
  • 22
  • 57
  • 95
cbudd
  • 97
  • 1
  • 2