I'm new to R and I'm analyzing a dataset with qualitative and quantitative variables.The dataset is this one. I want to perform a Ridge regression, so I did this:
library(caret)
set.seed(3)
train_index <- sample(1:nrow(Data1), round(nrow(Data1) * 0.7))
train <- Data1[train_index, ]
nrow(train) / nrow(Data1)
test <- Data1[-train_index, ]
nrow(test) / nrow(Data1)
and then, to transform the qualitative variables with dummy:
train_mat <- dummyVars(`Time spent on social media` ~ ., data = train, fullRank = F) %>%
predict(newdata = train) %>%
as.matrix()
test_mat <- dummyVars(`Time spent on social media` ~ ., data = test, fullRank = F) %>%
predict(newdata = test) %>%
as.matrix()
The problem is that the train and test matrix have different numbers of variables and I don't understand why.
I thought there could be some problem with the dummy transformation so I used also dummy-cols but nothing changed