0

I have a question about the dummyVars, predict and model.matrix. When a dataframe has character/ factor variables, we need to convert them into binnarized matrix so that the model can understand. For example, Principal Component Analysis does not accept character or factor variable.

This is the sample data:

when <- data.frame(time = c("afternoon", "night", "afternoon",
                        "morning", "morning", "morning",
                        "morning", "afternoon", "afternoon"),
               day = c("Mon", "Mon", "Mon",
                       "Wed", "Wed", "Fri",
                       "Sat", "Sat", "Fri"),
                       stringsAsFactors = TRUE)

levels(when$time) <- list(morning="morning",
                          afternoon="afternoon",
                          night="night")
levels(when$day) <- list(Mon="Mon", Tue="Tue", Wed="Wed", Thu="Thu",
                         Fri="Fri", Sat="Sat", Sun="Sun")

But I see that most scientist using these kind of code:

library(caret)    
mainEffects <- dummyVars(~ day + time, data = when)
predict(mainEffects, df)

to get the binnarized matrix instead of:

model.matrix(~ day + time, data = when)

Even I see the results are the same, and the Can anyone explain this habit for me?

Minh Ho
  • 5
  • 3
  • where does dummy vars come from? Please provide a [reproducible minimal example](https://stackoverflow.com/q/5963269/8107362). Especially, provide some sample data, e.g. with `dput()`, include all packages needed, and use the reprex-package. – mnist Aug 22 '21 at 15:12
  • Hello I updated it already (reproducible minimal example and package) – Minh Ho Aug 23 '21 at 04:33

0 Answers0