0

I have a data set of size 60, with the same variables for all the observations. 30 of them have values for wins (y), and 30 of them I have removed to be predicted.

In sas, when you want the model to predict a value for an unkown y (result), you put a dot in the dataline for the Y value and run the regression. The model will be based on the 30 observations that have the Y value, and then are predicted for the 30 that do not.

In r, I have made the Y values as NA for those observations I would like to predict. However, The model is instead ignoring those missing values, and not giving predicted results for those observations.

How can I have my model predict the values for the missing Y variables?

Joe
  • 62,789
  • 6
  • 49
  • 67
  • It depends a bit on what package you're using so some code may help, but otherwise, look at `PREDICT` function. – Reeza Feb 28 '18 at 20:45
  • 1
    When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Show the code you tried that exhibited the behavior you describe. – MrFlick Feb 28 '18 at 21:02

1 Answers1

0

In case you are a trying to predict on out of sample data do like this:

# Here I just generate some data, since no provided
X <- matrix(data = rnorm(400), ncol = 4)
B <- c(0.5, -0.5, 2, 0)
y <- X %*% B
dt <- data.frame(cbind(y, X))
names(dt) <- c("y", paste0("x", 1:4))

# Start with estimation on in-sample 
train_dt <- dt[1:50, ]
mod <- lm(formula = y ~ ., data = train_dt)

# Predict on out of sample
that <– predict(object = mod, newdata = dt[51:100, ])

# Calculate error, should be almost the same
eps <- yhat - y[51:100]

# In this example should be close to zero
all(eps) < 1e-10)
Aleh
  • 776
  • 7
  • 11
  • I followed the exact same approach in my case, however predict.lm() returns only missing values ('NA') for prediction of `type = "reponse"` (i.e. probably the default). Do you have a hunch what could be the cause of this? There are some missing cases in my data, but I thought this wouldn't be a problem... – Dr. Fabian Habersack Dec 29 '20 at 16:18
  • I double checked and ran na.omit(), before estimating and predicting my response variable, but that didn't change a thing. – Dr. Fabian Habersack Dec 29 '20 at 16:19
  • 1
    Plz provide an example dataset. The basic idea is to split data into train in test, and use proper API for prediction. Also check that the data you use to train is somewhat balanced. F.ex. if response feature is true, false and train dataset contains only true while test dataset contains false, you might get NaNs. – Aleh Jan 04 '21 at 09:51