0

This question is related to this question Cleaning up factor Levels collapsing multiple Level labels but I would like to extend this to a data table and collapse the factor levels for a subset of columns of my data table. I always struggle using lapply within data table...

Here is my MWE and what I would like to achieve using levels() for two columns separately

df<-data.table(Index=1:3,factor1=c("Yes", "No", "0"), factor2=c("yes","no","no"))
str(df)
subset_factor<-c("factor1", "factor2")
label.yesno<- list("Yes" = c("Yes","yes"),
                   "No"   = c("No", "no"))
df[,(subset_factor):=lapply(.SD,factor),.SD=subset_factor]
str(df)

levels(df$factor1)<-label.yesno
levels(df$factor2)<-label.yesno
df

I was hoping that I could use the list directly when I create the factors

df[,(subset_factor):=lapply(.SD,factor, labels=label.yesno),.SD=subset_factor]

or that I could use the Levels factor in another step somehow.. But I cannot find anything similiar. I actually would like the "0" to be transformed into NA as it is done in my MWE.

Max M
  • 806
  • 14
  • 29

1 Answers1

1

I think you could write a simple helper function to simplify this process:

# df<-data.table(Index=1:3,factor1=c("Yes", "No", "0"), factor2=c("yes","no","no"))
# str(df)
# subset_factor<-c("factor1", "factor2")
# label.yesno<- list("Yes" = c("Yes","yes"),
#                    "No"   = c("No", "no"))

f <- function(x, lab){
    res <- factor(x)
    levels(res) <- lab
    res
}
df[, (subset_factor) := lapply(.SD, f, lab = label.yesno), .SDcols = subset_factor]
Frank
  • 66,179
  • 8
  • 96
  • 180
mt1022
  • 16,834
  • 5
  • 48
  • 71
  • Perfect and seeing your solution I feel stupid for not having though about that myself... – Max M Apr 10 '18 at 12:19
  • @MaxM, I merely have a little more experience. you'll be a master as your experience accumulates. :) – mt1022 Apr 10 '18 at 13:01