1

While using predict.lm, I am either getting an error message or an incorrect solution, and I am trying to understand what might be causing that.

Before posting my problem here, I have read several solutions to problems similar to mine, as shown in an example here. However, the approach suggested in solutions to those problems doesn't appear to work here and I am trying to find out why and how to fix it.

To best explain my problem, consider the following MWE:

#------------------------------
# Fit least squares model
#------------------------------

data(mtcars)
a     <- mtcars$mpg
x     <- data.matrix(cbind(mtcars$wt, mtcars$hp))
xTest <- x[2,]  # We will use this for prediction later
fitCar <-lm(a ~ x) 

#------------------------------
# Prediction for x = xTest
#------------------------------

# Method 1 (doesn't work) 
yPred <- predict(fitCar, newdata = data.frame(x = xTest) , interval="confidence")
Error: variable 'x' was fitted with type "nmatrix.2" but type "numeric" was supplied

# Method 2 (works, but as you may observe, it is incorrect) 
yPred <- predict(fitCar, newdata = data.frame(xTest) , interval="confidence")

fit       lwr      upr
1  23.572329 22.456232 24.68843
2  22.583483 21.516224 23.65074
3  25.275819 23.974405 26.57723
4  21.265020 20.109318 22.42072
....
....
Warning message:
'newdata' had 2 rows but variables found have 32 rows 

Question: Given that we want to find yPred corresponding to xTest, what might be the right way of doing that?

Community
  • 1
  • 1
skaur
  • 167
  • 1
  • 1
  • 11

1 Answers1

0

Always pass a data.frame to lm if you want to predict:

a     <- mtcars$mpg
x     <- data.matrix(cbind(mtcars$wt, mtcars$hp))
DF <- data.frame(a, x)
xTest <- x[2,]  # We will use this for prediction later
fitCar <-lm(a ~ ., data = DF) 

yPred <- predict(fitCar, newdata = data.frame(X1 = xTest[1], X2 = xTest[2]) , interval="confidence")
#       fit      lwr      upr
#1 22.58348 21.51622 23.65074
Roland
  • 127,288
  • 10
  • 191
  • 288
  • Thanks Roland. However, the proposed solution for finding yPred (i.e. predict(fitCar, newdata = data.frame(X1 = xTest[1], X2 = xTest[2]), interval="confidence") ) is not scalable. For example, x and xTest have 1000 columns (i.e., 1000 features / predictors), which implies that we have to write X1 = xTest[1], x2 = xTest[2], ..., x1000 = xTest[1000]. I wonder if there is a way around it. – skaur Jul 08 '15 at 15:46
  • `data.frame(x[2,, drop = FALSE])` – Roland Jul 08 '15 at 21:54
  • Roland: The solution suggested above is applicable if xTest = x[2,]. However, it is not applicable in a general case such as xTest <- as.numeric(cbind(4,5)). Now, given this general case (xTest = as.numeric(cbind(4,5)) ), how do we modify the solution that you suggested above? – skaur Jul 09 '15 at 05:09
  • Just create a data.frame: `data.frame(t(xTest))`. This is really basic stuff. Please go and study some tutorials. – Roland Jul 09 '15 at 08:10
  • I wish it was as straightforward as that. To ensure we are on same page, here's what we are dealing with: `xTest <- as.numeric(cbind(4,5))`, and we then use following command for prediction: `predict(fitCar, newdata = data.frame(t(xTest)) , interval="confidence")` . The problem still persists as observed in the warning message shown below. `fit lwr upr 1 23.572329 22.456232 24.68843 2 22.583483 21.516224 23.65074 3 25.275819 23.974405 26.57723 4 21.265020 20.109318 22.42072... Warning message: 'newdata' had 1 row but variables found have 32 rows` – skaur Jul 09 '15 at 08:47
  • No, if you followed my advice there should be no warning. Good luck. – Roland Jul 09 '15 at 08:48