1

Specifically, I'm getting the error

Error: variable 'poly(x, degree = i, raw = TRUE)' was fitted with type "nmatrix.2" but type "nmatrix.5" was supplied

I was provided this starting code by my professor:

f = function(x) {
    return(x**2)
}

#Set a terribly boring seed for reproducibility
set.seed(1234)

#Make our training data
N = 100
x = sort(runif(N, 0, 1))
y = f(x) + rnorm(N, mean=0, sd=.3)

data_train = data.frame(x,y)

#Plot real quick
plot(x,y)

and the instructions for the problem say

fit four polynomials to the training data generated above: Fit polynomials of order 0, 1, 2, and 3 using the R function lm(), and name the resulting regression models fit0, fit1, fit2, and fit3 respectively. Additionally, plot the training data and the predictions for all four models (so you will have to make some predictions).

This is what I did so far:

fit0 = lm(formula = y ~ 1, data = data_train)

for(i in 1:5)
{
    assign(paste0("fit", i), lm(formula = y ~ poly(x, degree = i, raw = TRUE), 
    data = data_train))
}

plot(x,y)

abline(v = fit0$coef[1], col = 'blue')
abline(a = fit2$coef[1], b = fit2$coef[2], col = 'red')

y2 = predict(fit2, newdata = data.frame(x))

It works fine to create my fit variables and generate the scatterplot, but once I added the predict line (in order to get the values to plot for the quadratic model), it started giving me that weird error and I'm completely stumped as to why. I Googled the error message and I've been reading through the search results for hours, but the only example I found that seemed to actually match my problem was on a Japanese website, and Google translate clearly butchered the translation because the resulting grammar didn't make a lot of sense, so I couldn't really understand the posts.

I've also been reading and rereading my code to check that it matches the examples of how to plot polynomial regression equations in my course lecture notes and from my own previous examples of doing it that worked and I can't find anything meaningfully different here. I'm really confused because of the matrix thing in particular - I'm not using matrices at all,, so the only thing I can think of is that the messy and complicated structure of lm objects involves them somewhere? I just literally don't know what else to even try at this point, so any guidance would be greatly appreciated.

Dave2e
  • 22,192
  • 18
  • 42
  • 50

2 Answers2

2

The problem lies with how the model is defined.

'poly(x, degree = i, raw = TRUE)'

In the model the degree term is defined to equal i. Since i is equal to 5 from exiting the last loop. The predict function is expecting a fifth order polynomial. "fit2" is only second order and thus the error.
The work around is to redefine i before calling the predict function.

plot(x,y)
abline(h = fit0$coef[1], col = 'blue')

i<-2
y2 = predict(fit2, newdata = data.frame(x))
lines(x, y2, col="red")

i<-5
y5 = predict(fit5, newdata = data.frame(x))
lines(x, y5, col="green")

There may be a way to redefine i inside the predict function, but I'm at a loss on how.

Also note your call to abline should define a horizontal line and not vertical.

enter image description here

Dave2e
  • 22,192
  • 18
  • 42
  • 50
1

The issue is with the poly function, predictions are not so easy using it this way.

Here is a different version of your code, saving the results into a list rather than to the gloval env, simply because I prefer it. The model matrix is constructed outside the lm function and saved for later use, i.e. predictions.

fit=list()

for(i in 1:5) {
  fit[[paste0("mm",i)]][["data"]]=as.data.frame(model.matrix(y~poly(x,i,raw=T),data_train))
  fit[[paste0("fit",i)]][["model"]]=lm(formula=y~.-1,data=fit[[paste0("mm",i)]][["data"]])
}

y2 = predict(fit[["fit2"]][["model"]], fit[["fit2"]][["data"]])
user2974951
  • 9,535
  • 1
  • 17
  • 24
  • I don't think I understand the your code inside the loop. You made `fit` an empty list to start, so how does it make sense to try to access a space inside it by name? Could you maybe show how to do it without using a list, since it seems that I don't fully understand how lists work in R? I should learn them better at some point, but right now I just need to figure out this regression problem. – Mikayla Eckel Cifrese Oct 06 '22 at 10:32
  • @MikaylaEckelCifrese I am sure that you can translate this to your style of code without many issues. But anyway, list elements can be indexed with numbers or string (among others), you can assign values to non-existing elements of a list, R will create a new element and assign, but you couldn't access them if they don't exist. – user2974951 Oct 06 '22 at 10:40
  • It's not really about my code style, it's that I legitimately don't understand what you're doing with the list indexing, like AT ALL. – Mikayla Eckel Cifrese Oct 06 '22 at 20:01