2

I am trying to do something pretty simple with R but I am not sure I am doing it well. I have a dataset containing three columns V1,V4,V5 and I want to do a regression to get the coefficients Ci,j of the following polynomial of two variables:

sum[i=0->3] sum[j=0->i] Ci,j . (V4_k)^i . (V5_k)^(3-j)

So I tried using the function polym:

lm(V1 ~ polym(V4, V5, degree=3, raw = TRUE), data)

which gives me the following coefficients

[1]  1.048122e+04 -2.050453e+02  1.407736e+00 -3.309312e-03 -3.748650e+01  8.983050e-01 -4.308559e-03  1.834724e-01 -6.868446e-04  4.030224e-04

Now, if I understand well how we must build a formula, I assumed that the following would give the same:

lm(v1 ~ V4 + V5 + I(V4 * V5) + I(V4^2 * V5) + I(V4^3 * V5) + I(V4^2 * V5^2) + I(V4^2*V5^3) + I(V4^3 * V5^2) + I(V4^3 * V5^3), data)

But I get different coefficients:

[1]  3.130403e+03 -1.652007e+01 -1.592879e+02  3.984177e+00 -2.419069e-02  3.919910e-05  1.008657e-04  4.271893e-07 -5.305623e-07 -2.289836e-09

Could you please tell me what I am doing wrong, and what is the correct way to achieve this regression with R?

Tanguy A.
  • 155
  • 2
  • 11
  • Did you check the order of return matrix from poly()? It seems like its returning V4, V4^2, V4^3, V5, V5^2, V5^3,... which is different from your custom equation. Also, V4^3 * V5 is 4th order polynomial. – won782 Aug 20 '14 at 16:11

2 Answers2

2

The polym(V4, V5) call is not giving you what you think it is. (It doesn't matter if you use poly or polym for this example)

Let's look at an example:

v1 <- 1:10; v2 <- 1:10
poly(v1, v2, degree=3, raw=TRUE)
      1.0 2.0  3.0 0.1 1.1  2.1 0.2  1.2  0.3
 [1,]   1   1    1   1   1    1   1    1    1
 [2,]   2   4    8   2   4    8   4    8    8
 [3,]   3   9   27   3   9   27   9   27   27
 [4,]   4  16   64   4  16   64  16   64   64
 [5,]   5  25  125   5  25  125  25  125  125
 [6,]   6  36  216   6  36  216  36  216  216
 [7,]   7  49  343   7  49  343  49  343  343
 [8,]   8  64  512   8  64  512  64  512  512
 [9,]   9  81  729   9  81  729  81  729  729
[10,]  10 100 1000  10 100 1000 100 1000 1000

The column label is telling you the degree of the first and second vectors that you gave as arguments. The first three are from V2^0, the seconds three are linear in V2, and so on.

This is correct, but your second example has 4th degree terms in it. If you are actually looking for the 4th degree terms, just change degree to be 4 in the method call.

If you need some more help with polynomial regression, this article, on R-Bloggers should be helpful. It shows how to create models with both I() and poly although I think they were just univariate.

MentatOfDune
  • 309
  • 1
  • 9
1

With the sample data

dd<-data.frame(x1=rnorm(50),
   x2=rnorm(50))
dd<-transform(dd, z = 2*x1-.5*x1*x2 + 3*x2^2+x1^2 + rnorm(50))

we see that

lm(z~polym(x1,x2,degree=3, raw=T), dd)
lm(z~x1+I(x1^2)+I(x1^3)+I(x2)+I(x1*x2) + 
   I(x1^2*x2)+I(x2^2) + I(x1*x2^2) + I(x2^3), dd)

are the same.

Note that in your expansion, you have terms like

I(V4^3 * V5) + I(V4^2 * V5^2)

which are both 4th degree terms (the sum of the exponents is 4) so they should not appear in a third degree polynomial. So it depends on what you want. Normally, for a third degree polynomial, you have

sum[i=0->3] sum[j=0->3-i] Ci,j . (V4_k)^i . (V5_k)^j

so i+j<=3 always. It's unclear to me exactly what type of regression you want.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Indeed, my expansion was wrong. But now how to explain the fact that with the first call (using polym), I get 10 coefficients, and not only 6 like I get with the proper expansion (when removing the exponents >3)? – Tanguy A. Aug 20 '14 at 18:20
  • @TanguyA. If you look at my example above which has the proper expansion, there are 9 terms, plus an intercept term so you get 10 total coefficient estimates. So 10 is the correct number. Sounds like you are missing some terms in your expansion. – MrFlick Aug 20 '14 at 18:31
  • You are right. Now I get the same results with both calls. In that way I guess I get the coefficients I am looking for. Thanks for your answers! – Tanguy A. Aug 20 '14 at 18:32