R data.table multi column coversion by names

Question

Let DT be a data.table:

DT<-data.table(V1=factor(1:10),
           V2=factor(1:10),
           ...
           V9=factor(1:10),)

Is there a better/simpler method to do multicolumn factor conversion like this:

DT[,`:=`(
  Vn1=as.numeric(V1),
  Vn2=as.numeric(V2),
  Vn3=as.numeric(V3),
  Vn4=as.numeric(V4),
  Vn5=as.numeric(V5),
  Vn6=as.numeric(V6),
  Vn7=as.numeric(V7),
  Vn8=as.numeric(V8),
  Vn9=as.numeric(V9)
)]

Column names are totally arbitrary.

score 4 · Accepted Answer · edited May 23 '17 at 12:21

4

Yes, the most efficient would be probably to run set in a for loop

Set the desired columns to modify (you can chose all the names too using names(DT) instead)

cols <- c("V1", "V2", "V3")

Then just run the loop

for (j in cols) set(DT, i = NULL, j = j, value = as.numeric(DT[[j]]))

Or a bit less efficient but more readable way would be just (note the parenthesis around cols which evaluating the variable)

## if you chose all the names in DT, you don't need to specify the `.SDcols` parameter
DT[, (cols) := lapply(.SD, as.numeric), .SDcols = cols]

Both should be efficient even for a big data set. You can read some more about data.table basics here

Though beware of converting factors to numeric classes in such a way, see here for more details

edited May 23 '17 at 12:21

Community

1
1

answered Jul 29 '15 at 09:14

David Arenburg

91,361
17
137
196

Many thanks, that was it. Especially the second way. I prefer to use names than indexes. – Tomasz Jerzyński Jul 30 '15 at 08:31
You can use names in both options, see my edit. – David Arenburg Jul 30 '15 at 08:38

R data.table multi column coversion by names

1 Answers1

Linked

Related