-1

I am trying to use multiple linear regression in R, and I have trained my data by loading it from a file. But when I try to predict, I get a warning message:

"Warning messages:
1: 'newdata' had 45 rows but variables found have 8676 rows
2: In predict.lm(reg, tin) :
  prediction from a rank-deficient fit may be misleading"

My code is simple :

yval = read.table("value_of_y.txt",header = T)
xval = read.table("Rmat.txt",header = T)
reg<-lm(yval$y~xval$x1+xval$x2+xval$x3+xval$x4+xval$x5+xval$x6+xval$x7+xval$x8+xval$x9+xval$x10+xval$x11+xval$x12+xval$x13+xval$x14)
summary(reg)
tin = read.table("Rtest.txt",header = T)
predict(reg,tin)

My training data (Rmat.txt):

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14
1 -1 -1 1 1 1 1 1 1 1 1 1 1 1
1 -1 -1 1 1 1 1 1 1 1 1 1 1 1
1 -1 -1 1 1 1 1 1 1 1 1 1 1 1
-1 1 -1 1 1 1 1 1 1 1 1 1 1 1
-1 1 -1 1 1 1 1 1 1 1 1 1 1 1
-1 1 -1 1 1 1 1 1 1 1 1 1 1 1
-1 -1 -1 1 1 1 1 1 1 1 1 1 1 1
1 -1 -1 1 1 1 1 1 1 1 1 1 1 1
1 -1 -1 1 1 1 1 1 1 1 1 1 1 1

(value_of_y.txt):
5
-5
5
5
-5
5
5
-5
-5

My testing data which I use for prediction (Rtest.txt)

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14
-1 1 -1 1 1 1 1 1 -1 1 -1 1 -1 1
-1 1 -1 1 1 1 1 1 1 1 -1 1 -1 1
-1 -1 1 1 1 1 1 1 1 1 -1 1 -1 1
-1 -1 -1 1 1 1 1 1 1 1 -1 1 -1 1

How should I use the predict function instead?

charvi
  • 211
  • 1
  • 5
  • 16

1 Answers1

0

You need to be more careful when using the formula syntax with lm and predict. The column names in the model and the new data.frame must match exactly and this is not possible when you use the "$" syntax in the formula. Try something like

yval = read.table("value_of_y.txt",header = T)
xval = read.table("Rmat.txt",header = T)
reg<-lm(y~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+x12+x13+x14, cbind(yval, xval))
summary(reg)
tin = read.table("Rtest.txt",header = T)
predict(reg,tin)
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • I tried it, it still gives a warning "Warning message: In predict.lm(reg, tin) : prediction from a rank-deficient fit may be misleading" – charvi Nov 27 '14 at 05:15
  • 1
    That sounds like it may be a problem with your data. Since you have not yet made a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) however, it's impossible to tell for sure. – MrFlick Nov 27 '14 at 05:18
  • I have added some sample training and testing data – charvi Nov 27 '14 at 06:05
  • Well, your sample data doesn't nearly have enough observations to estimate 14 parameters. Perhaps you can reduce the variables in your example. Just make sure you can actually run it. Use `dput()`s of your data as described in the link on how to wrote a reproducible example. Is all your real data that discrete? – MrFlick Nov 27 '14 at 06:07
  • Yes, am working with discrete data, I have around 8000 observations for those 14 parameters, is that not enough? Do you think the problem is due to discrete values? – charvi Nov 27 '14 at 07:01