2

I need to plot the relationship between x and y where polynomials of x predict y. This is done using the poly() function in order to ensure polynomials are orthogonal.

How do I plot this relationship considering linear, quadratic and cubic terms together ? The issue is the coefficients for the different terms are not scaled as x is.

I provide some example code below. I have tried reassigning the contrast values for each polynomial to x.

This solution gives impossible predicted values.

Thank you in advance for your help !

Best wishes, Eric

Here is an example code:

x = sample(0:6,100,replace = TRUE)
y = (x*0.2) + (x^2*.05) + (x^3*0.001)
y = y + rnorm(100)
x = poly(x,3)
m = lm(y~x)
TAB = summary(m)$coefficients

### Reassigning the corresponding contrast values to each polynomial of x:
eq = function(x,TAB,start) { 
#argument 'start' is used to determine the position of the linear coefficient, quadratic and cubic follow
pols = poly(x,3)
x1=pols[,1]; x2=pols[,2]; x3=pols[,3]

TAB[1,1] + x1[x]*TAB[start,1] + x2[x] * TAB[start+1,1] + x3[x] * TAB[start+2,1]
}

plot(eq(0:7,TAB,2))
Eric M
  • 23
  • 3
  • 4
    I usually approach a problem like this by getting predicted values via `predict()`. It's essentially a convenience function to do what (I think) you're trying to do. – aosmith Nov 06 '18 at 14:49

1 Answers1

3

Actually, you can use poly directly in formula for lm().

  1. y ~ poly(x, 3) in lm() might be what you want.
  2. For plot, I'll use ggplot2 package which has geom_smooth() function. It can draw the fitted curve. You should specify
    • method = "lm" argument
    • and the formula

library(tidyverse)

x <- sample(0:6,100,replace = TRUE)
y <- (x*0.2) + (x^2*.05) + (x^3*0.001)
eps <- rnorm(100)
(df <- data_frame(y = y + eps, x = x))
#> # A tibble: 100 x 2
#>         y     x
#>     <dbl> <int>
#>  1  3.34      4
#>  2  1.23      5
#>  3  1.38      3
#>  4 -0.115     2
#>  5  1.94      5
#>  6  3.87      6
#>  7 -0.707     3
#>  8  0.954     3
#>  9  1.19      3
#> 10 -1.34      0
#> # ... with 90 more rows

Using your simulated data set,

df %>%
  ggplot() + # this should be declared at first with the data set
  aes(x, y) + # aesthetic
  geom_point() + # data points
  geom_smooth(method = "lm", formula = y ~ poly(x, 3)) # lm fit

enter image description here


If you want to remove the points: erase geom_point()

df %>%
  ggplot() +
  aes(x, y) +
  geom_smooth(method = "lm", formula = y ~ poly(x, 3))

enter image description here


transparency solution: control alpha less than 1

df %>%
  ggplot() +
  aes(x, y) +
  geom_point(alpha = .3) +
  geom_smooth(method = "lm", formula = y ~ poly(x, 3))

enter image description here

younggeun
  • 923
  • 1
  • 12
  • 19
  • Thank you very much ! In my real dataset I have millions of rows, which cannot easily be plotted (I stopped after 30 minutes of waiting). I aim was therefore just to plot the slope, without the data. Do you know of a way to do this ? Thank you again ! – Eric M Nov 06 '18 at 15:47
  • If you do not want the points, just remove `geom_point` in my code. Each `geom_` is just one layer, so you can easily exclude it if you want. – younggeun Nov 06 '18 at 20:52
  • If the problem is too-many-point, there is also an option for **transparency**. `alpha` argument between `0` and `1` does that. For points, in this case, `geom_point(alpha = .5)` or some other small numbers not to interrupt the line can be applied. – younggeun Nov 06 '18 at 21:20