0

I am working on a kaggle competition (House prices: Advanced regression techniques). I have been trying to run a ridge model on the data. I combined the test and training data first, and did some data cleaning, and then I separated them, and used the training set to come up with the function to apply it on the test data.

traintest=rbind(train,test)

#Converting all chars into factors

library(dplyr)
traintest = traintest %>% mutate_if(is.character, as.factor)

After getting rid of some of the variables, I seperated the two data set.

train <- traintest[is.na(traintest$SalePrice) == "FALSE",]
test <- traintest[is.na(traintest$SalePrice) == "TRUE",]

When I use the model.matrix function on the train data, it gives me back the matrix. But when I try it on the test data, it gives me a null row with all the variables.

x <- model.matrix(SalePrice~., train)[,-1]
x.test <- model.matrix(SalePrice~.,test)[,-1]

The test data has a column of NAs which I'm trying to predict.

Khasteh
  • 1
  • 1
  • 2
    You don't need to quote boolean and also `traintest[!is.na(traintest$SalePrice),]` or `traintest[is.na(traintest$SalePrice),]` – akrun Jan 17 '20 at 20:24
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Jan 17 '20 at 20:27

1 Answers1

0

The issue is with test$SalePrice are just NA, you need to use different column for model.matrix.

Or you can do it without SalePrice as

model.matrix(~ variable1 + variable2, test)

or

model.matrix(~ ., test[-1])

jyr
  • 690
  • 6
  • 20