0

I am looking for an elegant or efficient way to select columns in R's data.table.

Personally I value a flexible approach.

Therefore I tend to refer to columns by their characteristics rather than their names.

For example, I want to set the values of all columns to lower case.

If I include all columns in this operation, like so

dt[, lapply(.SD, tolower),.SDcols = names(dt)]

numeric and integer columns, too, will be converted to (lower case) character.

This is undesirable, and hence I first identify all character columns as folows:

char_cols <- as.character(names(dt[ , lapply(.SD, function(x) which(is.character(x)))]))

and subsequently pass char_cols to .SDcols

dt[ , lapply(.SD, tolower), .SDcols = char_cols ]

If instead, all your columns are character (for example to avoid type conversion issues while reading the data) I would go about it like this

char_cols <- as.character(names(dt[ , lapply(.SD, function(x) which(all(is.na(as.numeric(x)))))]))

One should be certain however, that no column is of mixed type: i.e. contains some character strings and some numeric values.

Does anyone have a suggestion to approach this more elegantly, or more efficiently?

o_v
  • 112
  • 8

2 Answers2

3

You can pass a logical/character vector to .SDcols.

For character columns, we can do

library(data.table)
cols <- names(Filter(is.character, dt))
dt[, (cols) := lapply(.SD, tolower), .SDcols = cols]
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • That is definitely more elegant. What about the second case. I load all my data as character type to avoid conversion issues with e.g. dates. Can you think of a better way to select those columns that are actual character type? – o_v Apr 19 '20 at 10:56
  • @o_v You can use `type.convert` to get data in their respective classes. `dt <- type.convert(dt, as.is = TRUE)` – Ronak Shah Apr 19 '20 at 11:02
  • So I think the disadvantage of the ```SDcols = sapply(dt, is.character)``` solution is that I am only left with the character columns. So if I want to keep both character and numeric columns, I'm still better of with my solution. – o_v Apr 19 '20 at 13:58
  • See updated answer. This will help you to retain all the columns in `dt`. – Ronak Shah Apr 19 '20 at 14:08
  • Neat! I knew I was going about it too cumbersomely. – o_v Apr 19 '20 at 15:06
0

We can use

library(data.table)
cols <- names(which(sapply(dt, is.character)))
dt[, (cols) := lapply(.SD, tolower), .SDcols = cols]
akrun
  • 874,273
  • 37
  • 540
  • 662