13

I am attempting to understand how the predict.loess function is able to compute new predicted values (y_hat) at points x that do not exist in the original data. For example (this is a simple example and I realize loess is obviously not needed for an example of this sort but it illustrates the point):

x <- 1:10
y <- x^2
mdl <- loess(y ~ x)
predict(mdl, 1.5)
[1] 2.25

loess regression works by using polynomials at each x and thus it creates a predicted y_hat at each y. However, because there are no coefficients being stored, the "model" in this case is simply the details of what was used to predict each y_hat, for example, the span or degree. When I do predict(mdl, 1.5), how is predict able to produce a value at this new x? Is it interpolating between two nearest existing x values and their associated y_hat? If so, what are the details behind how it is doing this?

I have read the cloess documentation online but am unable to find where it discusses this.

Alex
  • 19,533
  • 37
  • 126
  • 195
  • Interpolation, extrapolation, or both? I think you mean interpolation only. – smci Mar 30 '14 at 03:16
  • Here's [a link](https://stats.stackexchange.com/questions/223469/how-does-a-loess-model-do-its-prediction)! I hope this may help. – Seongje Chae Nov 16 '21 at 08:06

4 Answers4

7

However, because there are no coefficients being stored, the "model" in this case is simply the details of what was used to predict each y_hat

Maybe you have used print(mdl) command or simply mdl to see what the model mdl contains, but this is not the case. The model is really complicated and stores a big number of parameters.

To have an idea what's inside, you may use unlist(mdl) and see the big list of parameters in it.

This is a part of the manual of the command describing how it really works:

Fitting is done locally. That is, for the fit at point x, the fit is made using points in a neighbourhood of x, weighted by their distance from x (with differences in ‘parametric’ variables being ignored when computing the distance). The size of the neighbourhood is controlled by α (set by span or enp.target). For α < 1, the neighbourhood includes proportion α of the points, and these have tricubic weighting (proportional to (1 - (dist/maxdist)^3)^3). For α > 1, all points are used, with the ‘maximum distance’ assumed to be α^(1/p) times the actual maximum distance for p explanatory variables.

For the default family, fitting is by (weighted) least squares. For family="symmetric" a few iterations of an M-estimation procedure with Tukey's biweight are used. Be aware that as the initial value is the least-squares fit, this need not be a very resistant fit.

What I believe is that it tries to fit a polynomial model in the neighborhood of every point (not just a single polynomial for the whole set). But the neighborhood does not mean only one point before and one point after, if I was implementing such a function I put a big weight on the nearest points to the point x, and lower weights to distal points, and tried to fit a polynomial that fits the highest total weight.

Then if the given x' for which height should be predicted is closest to point x, I tried to use the polynomial fitted on the neighborhoods of the point x - say P(x) - and applied it over x' - say P(x') - and that would be the prediction.

Let me know if you are looking for anything special.

Ali
  • 9,440
  • 12
  • 62
  • 92
  • 1
    thank you for your answer. however, the logic/math behind what polynomial regression is described in my question. i am attempting to understand how one computes intermediate points. it must be through some sort of interpolation? – Alex Oct 10 '12 at 15:12
  • 1
    Thank you, yes, this is exactly what i describe in the question. Please note: "the fit at point x, the fit is made using points in a neighbourhood of x". the question is: what happens between x_1 and x_2.. at, for example, x_1 + epsilon that does not exist in the data-set – Alex Oct 10 '12 at 15:16
  • Still added two paragraphs. Don't hesitate if any part is not clear – Ali Oct 10 '12 at 15:20
  • thank you. however, i think we are misunderstanding each other. let's say there is a point (y_1, x_1) at which we want to compute g(y). we use all points (within span of x_1 and y_1) to compute an OLS regression. we repeat this process for all points (x_1..x_n) and therefore have g(y_1)...g(y_n). however, what happens if we look at a point x_1+epsilon not in the original dataset? we do not have a g(y_1+epsilon) to look up? – Alex Oct 10 '12 at 15:24
  • 1
    If every point (say x_1+epsilon) was in the dataset, what remained to be predicted? The other point is that we don't have just a single polynomial g(x), but say n polynomials g_1(x), g_2(x) ... g_n(x) such that g_i(x) is created to best fit the points in neighborhood of (x_i, y_i). Simply use the fitted polynomial to the closest point available in the dataset (say x_1) to predict it (So your answer would be g_1(x_1 + epsilon). – Ali Oct 10 '12 at 15:30
5

To better understand what is happening in a loess fit try running the loess.demo function from the TeachingDemos package. This lets you interactively click on the plot (even between points) and it then shows the set of points and their weights used in the prediction and the predicted line/curve for that point.

Note also that the default for loess is to do a second smoothing/interpolating on the loess fit, so what you see in the fitted object is probably not the true loess fitting information, but the secondary smoothing.

Greg Snow
  • 48,497
  • 6
  • 83
  • 110
2

Found the answer on page 42 of the manual:

In this algorithm a set of points typically small in number is selected for direct    
computation using the loess fitting method and a surface is evaluated using an interpolation
method that is based on blending functions. The space of the factors is divided into
rectangular cells using an algorithm based on k-d trees. The loess fit is evaluated at
the cell vertices and then blending functions do the interpolation. The output data
structure stores the k-d trees and the fits at the vertices. This information
is used by predict() to carry out the interpolation.
Alex
  • 19,533
  • 37
  • 126
  • 195
  • Which manual? I am attempting to find the answer myself and would like to see the blending functions. – Mark Miller Jul 08 '15 at 14:07
  • I think you are quoting this document: http://www.netlib.org/a/cloess.pdf which seems to be an appendix to a paper or report by William S. Cleveland, Eric Grosse, and Ming-Jen Shyu. Although I am not certain about the citation because I have not located the main document, just the appendix. – Mark Miller Jul 08 '15 at 14:51
1

I geuss that for predict at x, predict.loess make a regression with some points near x, and calculate the y-value at x.

Visit https://stats.stackexchange.com/questions/223469/how-does-a-loess-model-do-its-prediction

Dharman
  • 30,962
  • 25
  • 85
  • 135