0

I apply a linear regression over a data set:

## Random set 
set.seed(123)
i0=0
imax = 100
x = seq(0,imax,1)
y = c(i0)
for( i in 1:imax ){
i1 = rnorm( n = 1, mean = i0, sd = 1)   
y = c( y, i1 )
i0 = i1
}
plot(x,y)

## Build a data frame out of it
d0 = data.frame( x, y )

## Apply a linear regression 
f0 = lm( d0$y ~ d0$x )

## Plot the fitted function
abline(f0)

and now I want to use this fitted function to know the predicted value for

  1. Interpolated values (e.g. x=3.5)
  2. Extrapolated values (e.g. x=110)

I found only this answer through the web:

y2=predict(f0, data.frame(x=seq(0,100,1)))

But this is different from what I want. I could of course implement by hand these functions using their parameters, but I want to have it general.

Any hint welcome!

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
Xavier Prudent
  • 1,570
  • 3
  • 25
  • 54
  • What do you mean by "intrapolated" and "extrapolated" values? What kid of result are you expecting (`predict` returns prediction from the model using new data - what else do you need)? – Tim May 23 '16 at 18:53
  • You are representing a time series data purely by its linear trend even if there is much more information available in the data. Is this really what you are looking for? – Michael M May 23 '16 at 19:00
  • This exemple is just home made. What I try to do is to fit a function f(x) = c1 * x + c2 * x + c3... over a time serie using lm, and then to use the result of lm to compute f(x) for whatever value of x I may fancy about. – Xavier Prudent May 23 '16 at 19:03
  • 1
    Linearly extra- or interpolating time series is rarely smart, but to answer the question, just use `predict(f0, data.frame(x = z))` with `z` being any value. The R help is usually rather good, so typing `?predict.lm` will explain it in a more detailed way. You have to use `f0 = lm(y ~ x, data = d0)` to make clear how the variables are called. – Michael M May 23 '16 at 19:11
  • I had looked at the predict function, but it spit in my face while i was just kindly doing: predict(f0,data.frame(x=20)) but got: 'newdata' had 1 row but variables found have 101 rows – Xavier Prudent May 23 '16 at 19:19
  • Hehehe. Check out the edit in my last comment. It is all about the variable names. R thinks the x variable is called `d0$x`. – Michael M May 23 '16 at 19:51
  • Do you mean the correct command line should be: predict(f0, data.frame(d0$x = 3.5)) ? – Xavier Prudent May 24 '16 at 09:37

1 Answers1

2

In linear regression your best estimate (regardless of whether you inter/extra polate) is always just to compute the fitted values. That is, you have the equation:

$$ y = \beta_0 + \beta_1 x_1 + \dotso + \beta_k x_k $$

And you simple enter the values. multiply by $\beta_j$, and sum. The easy way to do this is to store data in vectors. Like so:

$$ y = \boldsymbol{\beta}'X $$

Where $\boldsymbol{\beta}'$ is a row vector of coefficents $(1 \times k)$ and $X$ is a column vector $(k \times 1)$. Hence $y$ is a scalar (the fitted/predicted value). In R it would be like:

# Generate data:
x <- rgamma(n = 1000, shape =  2)
y <- 5 + 0.5*x + rnorm(1000)
reg1 <- lm(y ~ x)

# Now for doing unit prediction:
some_new_x <- 5 # This is the new value of x you wish to predict for
intercept  <- 1 # This is always 1
coef(reg1) %*% c(intercept, some_new_x)

# We can also do predictions for an entire data frame:
x <- seq(from = 1, to = 1000, by = 1)
predict(reg1, newdata = data.frame(cbind(1, x)))

Using predict should really be your preferred way. It keeps track of variables by names, so you do not have to organize it the right order to get a meanigfull number.

Repmat
  • 690
  • 6
  • 19
  • Thanks, however I want to avoid having to implement the function when looking for the interpolation. Is that really the only way? – Xavier Prudent May 23 '16 at 18:51
  • 2
    What do you mean implement the function? It is right there in base R without any packages? – Repmat May 23 '16 at 18:52
  • I mean, if I fit the function f(x) = c1 * x + c2 * c + c3, is there a way to get f(x=3.5) using the result of lm, or do I need to extract the coefficients c1,c2,c3 from lm and compute it? Am I making my issue clear? – Xavier Prudent May 23 '16 at 19:00
  • 1
    Yes you need all coefficients (or assume the rest are all 0). Predict is what you want, it does exactly what you describe – Repmat May 23 '16 at 19:04
  • I'm not sure where `c` comes from in the above equation, but in `R`, you would write a small lightweight function for that use case using the ideas @Repmat discusses in the answer. If you want to hide the call to `coef` you can write a function that takes the `lm` object as an argument, and then uses `coef` in the function body. – Matthew Drury May 23 '16 at 19:05
  • c was a mistaping. I had played with predict: predict(f0,data.frame(x=20)) but got: 'newdata' had 1 row but variables found have 101 rows – Xavier Prudent May 23 '16 at 19:07