0

I want to changes some columns from "chr" or "num" to "factor", and the remaining columns are not affected, Here is my code:

>library("data.table")
>titanic <- fread("titanic.csv")
>str(titanic)
Classes ‘data.table’ and 'data.frame':  887 obs. of  8 variables:
 $ Survived               : int  0 1 1 1 0 0 0 0 1 1 ...
 $ Pclass                 : int  3 1 3 1 3 3 1 3 3 2 ...
 $ Name                   : chr  "Mr. Owen Harris Braund" "Mrs. John Bradley (Florence Briggs Thayer) Cumings" "Miss. Laina Heikkinen" "Mrs. Jacques Heath (Lily May Peel) Futrelle" ...
 $ Sex                    : chr  "male" "female" "female" "female" ...
 $ Age                    : num  22 38 26 35 35 27 54 2 27 14 ...
 $ Siblings/Spouses Aboard: int  1 1 0 1 0 0 0 3 0 1 ...
 $ Parents/Children Aboard: int  0 0 0 0 0 0 0 1 2 0 ...
 $ Fare                   : num  7.25 71.28 7.92 53.1 8.05 ...
>titanic_tmp <- titanic[, lapply(.SD,function(x) factor(x,levels = unique(x))),.SDcols =c(1,2,4,6,7)]
>titanic <- cbind(titanic_tmp,titanic[,c(3,5,8)]) 

So the code above can solve my problem, but it's too cumbersome,I know that ":=" operator could update data.table columns in-place, How can I use ":=" here to update column NO.1,2,4,6 and 7? or other convenient or simple way to do this?

Zzx Zhang
  • 11
  • 1

1 Answers1

0

The canonical and most-efficient way in data.table to modify several columns in place with an lapply or similar method is via a vector of names, both in .SDcols=, and in the LHS of the := assignment:

cols <- names(titanic)[c(1,2,4,6,7)]
titanic[, c(cols) := lapply(.SD, factor), .SDcols = cols]
#    Survived Pclass                                    Name    Sex   Age Siblings/Spouses Aboard Parents/Children Aboard  Fare
#      <fctr> <fctr>                                  <char> <fctr> <num>                  <fctr>                  <fctr> <num>
# 1:        0      3                  Mr. Owen Harris Braund   male    22                       1                       0  7.25
# 2:        1      1 Mrs. John Bradley (Florence Briggs T... female    38                       1                       0 71.28
# 3:        1      3                   Miss. Laina Heikkinen female    26                       0                       0  7.92
# 4:        1      1 Mrs. Jacques Heath (Lily May Peel) F... female    35                       1                       0 53.10

## and if you need the columns reordered,
setcolorder(titanic, c(1,2,4,6,7,3,5,8))
titanic
#    Survived Pclass    Sex Siblings/Spouses Aboard Parents/Children Aboard                                    Name   Age  Fare
#      <fctr> <fctr> <fctr>                  <fctr>                  <fctr>                                  <char> <num> <num>
# 1:        0      3   male                       1                       0                  Mr. Owen Harris Braund    22  7.25
# 2:        1      1 female                       1                       0 Mrs. John Bradley (Florence Briggs T...    38 71.28
# 3:        1      3 female                       0                       0                   Miss. Laina Heikkinen    26  7.92
# 4:        1      1 female                       1                       0 Mrs. Jacques Heath (Lily May Peel) F...    35 53.10

FYI, I shortened lapply(.SD, function(x) factor(x, levels=unique(x)) to just lapply(.SD, factor), since the default behavior is to set the levels to the unique values found. You can revert to the longer lapply format if you prefer.

r2evans
  • 141,215
  • 6
  • 77
  • 149