1
collection <- data.frame(col1=X1,col2=X2,col3=X3,col4=X4)
k <- 5
ind <- sample(seq(1,k), length(X1), replace=TRUE)

test_ind = which(ind==1)
train<-collection[-test_ind,]
fit<-lm(X1~poly(X2,2,raw=T)+X3+X4+X2:X3,data=train)
model1_resid<-predict(fit,collection[test_ind,2:4])

Warning message: 'newdata' had 105 rows but variables found have 444 rows

BTW: length(test_ind) is 105 and nrow(train)=444

I plan to run cross validation, but the above code generates the warning, I already followed other posts in this forum to do subsetting before I enter the lm function, why there is still warning? Anyone can point out the bug? Thanks

Jin
  • 1,203
  • 4
  • 20
  • 44

1 Answers1

1

I think you need to use the same variable names, so if you want to use columns 2,3,4 for your prediction, the names shoult be X1, X2, X3 as they are used for the model (not col2, col3 and col4 as you have).

Try for example colnames(collection) = c("X0", "X1", "X2", "X3") before the predict call and it should work (although I don't understand if you really wanted to use col2, col3 and col4 for predicting).

Fanny
  • 310
  • 3
  • 4
  • 8
  • I corrected mistake in my code, but it still does not work. Would you look again? Thanks – Jin Mar 13 '14 at 14:53
  • 1
    If you used the code you wrote in your question, then you colnames are still col1, col2, col3 and col4, instead there should be X1, X2, X3 and X4. You either need to change the first row to collection <- data.frame(X1=X1,X2=X2,X3=X3,X4=X4) or you can change the colnames later (but before the predict call in the last row) with this row: colnames(collection) = c("X1", "X2", "X3, "X4) – Fanny Mar 13 '14 at 21:21