I am looking for an elegant or efficient way to select columns in R
's data.table
.
Personally I value a flexible approach.
Therefore I tend to refer to columns by their characteristics rather than their names.
For example, I want to set the values of all columns to lower case.
If I include all columns in this operation, like so
dt[, lapply(.SD, tolower),.SDcols = names(dt)]
numeric and integer columns, too, will be converted to (lower case) character.
This is undesirable, and hence I first identify all character columns as folows:
char_cols <- as.character(names(dt[ , lapply(.SD, function(x) which(is.character(x)))]))
and subsequently pass char_cols
to .SDcols
dt[ , lapply(.SD, tolower), .SDcols = char_cols ]
If instead, all your columns are character (for example to avoid type conversion issues while reading the data) I would go about it like this
char_cols <- as.character(names(dt[ , lapply(.SD, function(x) which(all(is.na(as.numeric(x)))))]))
One should be certain however, that no column is of mixed type: i.e. contains some character strings and some numeric values.
Does anyone have a suggestion to approach this more elegantly, or more efficiently?