1

I'm trying to convert a few columns of a data.table that have a specific string "_cat" in their names from integer to categorical data type. The total number of columns in the data.table is large (over 700) and I don't want to go through over 700 columns to figure out which ones have _cat in their names to change the data type. (they are randomly all over the data.table)

first I get the logical array indicating which columns have '_cat' in their names:

cat_id <- grepl('_cat', colnames(dt))

somehow i have to use this cat_id logical vector to convert the corresponding columns with TRUE in cat_id to factor. I'm not sure how to use by clause to exclude the columns without the pattern (those for which cat_id entry is FALSE)

dt <- dt[, lapply(.SD, as.factor), by = ??? ]
Ankhnesmerira
  • 1,386
  • 15
  • 29

1 Answers1

4

This is not a group by operations. For selecting the columns, use .SDcols and then assign (:=) the output back to the columns of interest

dt[, (cat_id) := lapply(.SD, factor), .SDcols = cat_id ]

where

cat_id <- grep('_cat', colnames(dt), value = TRUE)
akrun
  • 874,273
  • 37
  • 540
  • 662