best way to select columns from data.table by type

Question

I am looking for an elegant or efficient way to select columns in R's data.table.

Personally I value a flexible approach.

Therefore I tend to refer to columns by their characteristics rather than their names.

For example, I want to set the values of all columns to lower case.

If I include all columns in this operation, like so

dt[, lapply(.SD, tolower),.SDcols = names(dt)]

numeric and integer columns, too, will be converted to (lower case) character.

This is undesirable, and hence I first identify all character columns as folows:

char_cols <- as.character(names(dt[ , lapply(.SD, function(x) which(is.character(x)))]))

and subsequently pass char_cols to .SDcols

dt[ , lapply(.SD, tolower), .SDcols = char_cols ]

If instead, all your columns are character (for example to avoid type conversion issues while reading the data) I would go about it like this

char_cols <- as.character(names(dt[ , lapply(.SD, function(x) which(all(is.na(as.numeric(x)))))]))

One should be certain however, that no column is of mixed type: i.e. contains some character strings and some numeric values.

Does anyone have a suggestion to approach this more elegantly, or more efficiently?

Ronak Shah · Accepted Answer · 2020-04-19T14:07:35.240

3

You can pass a logical/character vector to .SDcols.

For character columns, we can do

library(data.table)
cols <- names(Filter(is.character, dt))
dt[, (cols) := lapply(.SD, tolower), .SDcols = cols]

edited Apr 19 '20 at 14:07

answered Apr 19 '20 at 10:37

Ronak Shah

That is definitely more elegant. What about the second case. I load all my data as character type to avoid conversion issues with e.g. dates. Can you think of a better way to select those columns that are actual character type? – o_v Apr 19 '20 at 10:56
@o_v You can use `type.convert` to get data in their respective classes. `dt <- type.convert(dt, as.is = TRUE)` – Ronak Shah Apr 19 '20 at 11:02
So I think the disadvantage of the ```SDcols = sapply(dt, is.character)``` solution is that I am only left with the character columns. So if I want to keep both character and numeric columns, I'm still better of with my solution. – o_v Apr 19 '20 at 13:58
See updated answer. This will help you to retain all the columns in `dt`. – Ronak Shah Apr 19 '20 at 14:08
Neat! I knew I was going about it too cumbersomely. – o_v Apr 19 '20 at 15:06

score 0 · Answer 2 · answered Apr 19 '20 at 17:22

0

We can use

library(data.table)
cols <- names(which(sapply(dt, is.character)))
dt[, (cols) := lapply(.SD, tolower), .SDcols = cols]

answered Apr 19 '20 at 17:22

akrun

2 Answers2