I have a question about the dummyVars, predict and model.matrix. When a dataframe has character/ factor variables, we need to convert them into binnarized matrix so that the model can understand. For example, Principal Component Analysis does not accept character or factor variable.
This is the sample data:
when <- data.frame(time = c("afternoon", "night", "afternoon",
"morning", "morning", "morning",
"morning", "afternoon", "afternoon"),
day = c("Mon", "Mon", "Mon",
"Wed", "Wed", "Fri",
"Sat", "Sat", "Fri"),
stringsAsFactors = TRUE)
levels(when$time) <- list(morning="morning",
afternoon="afternoon",
night="night")
levels(when$day) <- list(Mon="Mon", Tue="Tue", Wed="Wed", Thu="Thu",
Fri="Fri", Sat="Sat", Sun="Sun")
But I see that most scientist using these kind of code:
library(caret)
mainEffects <- dummyVars(~ day + time, data = when)
predict(mainEffects, df)
to get the binnarized matrix instead of:
model.matrix(~ day + time, data = when)
Even I see the results are the same, and the Can anyone explain this habit for me?