0

I am using the Carseats dataset to be found in the ISLR package and I want to find an automated way to create new features --i.e. variables. To begin with I want to create polynomials of second degree for all predictors of Sales.

I convert the factors to dummy variables using the caret library function dummyVars(). The code is the following:

dummies <- dummyVars(~ ., data = Carseats_)
Carseats_d <- predict(dummies, newdata = Carseats_)
Carseats_d <- as.data.frame(Carseats_d)
setDT(Carseats_d)

Then I use code found in a Stackoverflow post (Select / assign to data.table variables which names are stored in a character vector):

a1 <- data.table(a=1:5, b=6:10, c1=letters[1:5])
sapply(a1, class)  # show classes of columns
#         a           b          c1 
# "integer"   "integer" "character" 
# column name character vector
nm <- c("a", "b")
# Convert columns a and b to numeric type
a1[, j = (nm) := lapply(.SD, as.numeric ), .SDcols = nm ]

I adapt this code to my needs as follows:

 > dim(Carseats_d)
    [1] 400  15
    predictors <- setdiff(names(Carseats_d), "Sales")
    Carseats_d[, j = (predictors) := lapply(.SD, function(x){x^2} ), .SDcols = predictors ]
> dim(Carseats_d)
[1] 400  15

So, nothing happens.

Could you help me understand why my code does not work and I should fix it?

Your advice will be appreciated.

rf7
  • 1,993
  • 4
  • 21
  • 35
  • 1
    You are just overwriting the existing columns (except "Sales") so the dimension doesn't change. If you want new columns added, you have to give them other names than those already present in the data – talat May 26 '17 at 06:51
  • More specifically, you can do `paste0(predictors, "2") := ...` – Frank May 26 '17 at 14:19

0 Answers0