I am working on a kaggle competition (House prices: Advanced regression techniques). I have been trying to run a ridge model on the data. I combined the test and training data first, and did some data cleaning, and then I separated them, and used the training set to come up with the function to apply it on the test data.
traintest=rbind(train,test)
#Converting all chars into factors
library(dplyr)
traintest = traintest %>% mutate_if(is.character, as.factor)
After getting rid of some of the variables, I seperated the two data set.
train <- traintest[is.na(traintest$SalePrice) == "FALSE",]
test <- traintest[is.na(traintest$SalePrice) == "TRUE",]
When I use the model.matrix function on the train data, it gives me back the matrix. But when I try it on the test data, it gives me a null row with all the variables.
x <- model.matrix(SalePrice~., train)[,-1]
x.test <- model.matrix(SalePrice~.,test)[,-1]
The test data has a column of NAs which I'm trying to predict.