13

I am using geom_smooth from the ggplot2 package to create a smoothed line on a time series scatter plot (one point for each day of the year, so I have 365 points). One of the arguments is called span, and going into the help file (?geom_smooth) the following description is given:

span controls the amount of smoothing for the default loess smoother. Smaller numbers produce wigglier lines, larger numbers produce smoother lines.

However, this doesn't actually tell me what the span argument is controlling. Setting it to 1 is useless, and setting it to 0.1 provides something that looks good.

span = 0.5

Plot using <code>span = 0.5</code>

span = 0.1

Plot using <code>span = 0.1</code>

However, when describing the plot, since I'm not totally sure what span actually changes, I'm not sure how to describe the smoothed line. Any pointers?

merv
  • 67,214
  • 13
  • 180
  • 245
amccnnll
  • 387
  • 1
  • 2
  • 15
  • 1
    Some of the information in [here](http://www.statsdirect.com/help/content/nonparametric_methods/loess.htm) might be helpful. – conrad-mac Feb 20 '17 at 08:06
  • 4
    The explanation of the `f` parameter in the `lowess` package might also help your understanding. _...the smoother span. This gives the proportion of points in the plot which influence the smooth at each value. Larger values give more smoothness._ See [here](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/lowess.html). – conrad-mac Feb 20 '17 at 08:10
  • Thanks Conrad - that's exactly what I was hoping to find, it makes a lot more sense now. Do you know how I suggest they add that link to the help page, I think it would be really useful. – amccnnll Feb 20 '17 at 20:36
  • 1
    No problem. I'm not sure what the correct way is. You could potentially submit a pull request [here](https://github.com/tidyverse/ggplot2/blob/master/R/geom-smooth.r). Alternatively, I found [this](http://hadley.wufoo.com/forms/documentation-feedback/def/field0=geom_smooth) documentation feedback page by clicking the link at the bottom of http://docs.ggplot2.org/current/geom_smooth.html – conrad-mac Feb 21 '17 at 04:24

2 Answers2

13

The span (also defined alpha) will determine the width of the moving window when smoothing your data.

"In a loess fit, the alpha parameter determines the width of the sliding window. More specifically, alpha gives the proportion of observations that is to be used in each local regression. Accordingly, this parameter is specified as a value between 0 and 1. The alpha value used for the loess curve in Fig. 2 is 0.65; so, each of the local regressions used to produce that curve incorporates 65% of the total data points. "

Taken from:

Jacoby (2000) Loess:: a nonparametric, graphical tool for depicting relationships between variables. Electoral Studies 19-4. (Paywalled paper)

For more details check the referenced paper.

Ken Y-N
  • 14,644
  • 21
  • 71
  • 114
Juan Ossa
  • 1,153
  • 1
  • 10
  • 14
  • How to make sense of this window for a multivariate regression (several independent variables)? – Julien Mar 18 '23 at 10:26
12

LOESS smoothing is a non-parametric form of regression that uses a weighted, sliding-window, average to calculate a line of best fit. Within each "window", a weighted average is calculated, and the sliding window passes along the x-axis.

One can control the size of this window with the span argument. The span element controls the alpha, which is the degree of smoothing. The smaller the span, the smaller the 'window', hence the noisier/ more jagged the line.

Look for documentation under LOESS rather than span.

Law Val
  • 129
  • 1
  • 5