-2

I have created a linear model with the R program. And I have predicted a new variable using the model. Running the model, it will print the output of prediction 600 times! (the number of variables we have in the data set). Here is the code:

load(sports)
summary (sports)
ls(sports)
fit = lm(sport_score ~ sport_votes + sport_rating , data = sports)
summary(fit)

newdata = data.frame( sport_vote = 80, sport_rating = 7.7)

predict(fit, newdata, interval="predict") 

How can I print the output just once?

Zapata
  • 133
  • 1
  • 5
  • 20
  • The default behavior of `predict` is to give you a prediction for every complete case. If that's not what you want ... then read the help page and learn to use a `newdata` argument. – IRTFM Aug 14 '16 at 23:13
  • 1
    Why do you have `data = sports` in the `newdata=data.frame()` call? That's most certainly incorrect. Take it out and you should be fine. – MrFlick Aug 14 '16 at 23:20
  • MrFlick I did it and still having the same problem. – Zapata Aug 14 '16 at 23:31
  • 1
    Can you do `dput(head(sports))` and paste the output in your question to help us reproduce your issue? – Weihuang Wong Aug 14 '16 at 23:47
  • You need to make your example above [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) so we can copy/paste into R to see the problem. Either your `newdata` data.frame doesn't have the rows you think it does or you've specified a formula with `$` in it (which is bad). Either way you are probably leaving something important out from your example above. – MrFlick Aug 15 '16 at 03:47

1 Answers1

1

It should be :

predict(fit, newdata=newdata, interval="predict") 

The first newdata is a parameter name. The second newdata is the symbol name of your 'values' to be used. If you don't give a value to the newdata parameter, it will just look for the default value, which as I said are the complete cases in sports.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Still It print the output 600 times. – Zapata Aug 14 '16 at 23:41
  • 3
    I can't replicate this issue - `fit <- lm(Sepal.Length ~ Sepal.Width, data=iris); newdata <- data.frame(Sepal.Width=30); predict(fit, newdata, interval="pred")` works fine. – thelatemail Aug 14 '16 at 23:45
  • 1
    @Zapata: you need to show us the output of `str(newdata` – IRTFM Aug 14 '16 at 23:59
  • > str(newdata) 'data.frame': 1 obs. of 1 variable: $ Sepal.Width: num 30 It is actually printing the same output 600 times – Zapata Aug 15 '16 at 00:01
  • 3
    @Zapata - that sounds highly unlikely. If your `newdata` now has `Sepal.Width` as per the `iris` dataset, it shouldn't be returning 600 predictions. It would either error out because the variables don't match or return the 150 rows that `iris` has. I think you need to start a fresh R session and step through your code piece by piece. Something is not right here. – thelatemail Aug 15 '16 at 00:16