0

So, I have 2 datasets, training and test. The training dataset is a 926x9 matrix. The first 8 columns represent the feature vector x and the last column represents single valued output y. The test data set 103x8 matrix. I am looking to perform linear regression on the same.

trainData <- read.table("./traindata.txt")
X <- as.matrix(trainData[,1:8])
Y <- as.matrix(trainData[,9])
relation <- lm(Y~X)
testData <- read.table("./testinputs.txt")
testX <- as.matrix(testData[,1:8])
testOutputForY <- predict(relation, newdata = data.frame(X = testX))

The warning message I get is 'newdata' had 103 rows but variables found have 926 rows. I am not sure as to what changes need to be made to get it working fineenter code here

  • 2
    To use `predict()` you really need to use a `data=` argument in your `lm()`. So the column names match up between your train data and test inputs? When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. (We don't need your entire dataset, just enough it see what's really going on.) – MrFlick Mar 07 '18 at 17:10
  • Related: https://stackoverflow.com/questions/40710992/applying-lm-and-predict-to-multiple-columns-in-a-data-frame – MrFlick Mar 07 '18 at 17:11
  • Related: https://stackoverflow.com/questions/17380300/shortcut-using-lm-in-r-for-formula – MrFlick Mar 07 '18 at 17:11

0 Answers0