2

I really can't figure out why I'm getting an error when I use predict(). I checked this post but I'm still getting the same error predict(). I split a data frame into two (1. Train, 2. Test).

I ran a logistic model in train and applied it to test but am getting an error. Here's the code:

train=rteam[which(rteam$season!="A"),]
test=rteam[which(rteam$season=="A"),]
length(train$outcome)
#[1] 163478
#ength(test$outcome)
[1] 8246

logit.1=glm(outcome ~ hometeam + dpoints.diff + opoints.diff + outcome.sma5 + opp.outcome.sma5, data=train,
+             family="binomial", na.action=na.exclude)


test$predict=predict(logit.1, data=test, type="response")
# Error in `$<-.data.frame`(`*tmp*`, "predict", value = c(NA, NA, NA, NA,  : 
#  replacement has 163478 rows, data has 8246

I keep getting this error. I ran the predict statement again as a stand alone vector and it returned a vector with a length of the train data frame.

predict=predict(logit.1, data=test, type="response")
length(predict)
# [1] 163478

Any ideas on what's going on? Is my code wrong?

Solution

predict() requires newdata= rather than data=, doh!

test$predict=predict(logit.1, newdata=test, type="response")
length(test$predict)
# [1] 8246
Community
  • 1
  • 1
Richard
  • 167
  • 1
  • 3
  • 11
  • Your `predict` vector is longer than the number of rows in `test`, presumably due to `NA` values. – Thomas Feb 27 '14 at 20:27
  • I did a bit more digging and found out that the Predict() statement requires "newdata=" rather than "data=". I updated the solution at the end of the my OP. – Richard Feb 27 '14 at 20:30
  • If you found a solution to your problem, you should post it as an answer and then, accept that answer as correct. – Thomas Feb 27 '14 at 20:32
  • Thanks Thomas, will do. New to stackexchange.... I have to wait 8 hours. – Richard Feb 27 '14 at 20:33

0 Answers0