R data.table multi column recode/sub-assign

Question

Let DT be a data.table:

DT<-data.table(V1=sample(10),
               V2=sample(10),
               ...
               V9=sample(10),)

Is there a better/simpler method to do multicolumn recode/sub-assign like this:

DT[V1==1 | V1==7,V1:=NA]
DT[V2==1 | V2==7,V2:=NA]
DT[V3==1 | V3==7,V3:=NA]
DT[V4==1 | V4==7,V4:=NA]
DT[V5==1 | V5==7,V5:=NA]
DT[V6==1 | V6==7,V6:=NA]
DT[V7==1 | V7==7,V7:=NA]
DT[V8==1 | V8==7,V8:=NA]
DT[V9==1 | V9==7,V9:=NA]

Variable names are completely arbitrary and do not necessarily have numbers. Many columns (Vx:Vx) and one recode pattern for all (NAME==1 | NAME==7, NAME:=something).

And further, how to multicolumn subassign NA's to something else. E.g in data.frame style:

data[,columns][is.na(data[,columns])] <- a_value

It's not very `data.table`ish, but you could simply do `is.na(DT) <- (DT == 7 | DT == 1)` (if your data set isn't too big). — David Arenburg, Jul 30 '15 at 10:10

score 7 · Accepted Answer · edited Jun 20 '20 at 09:12

7

You could use set for replacing values in multiple columns. Based on the ?set, it is fast as the overhead of [.data.table is avoided. We use a for loop to loop over the columns and replace the values that were indexed by the 'i' and 'j' with 'NA'

 for(j in seq_along(DT)) {
      set(DT, i=which(DT[[j]] %in% c(1,7)), j=j, value=NA)
  }

EDIT: Included @David Arenburg's comments.

data

set.seed(24)
DT<-data.table(V1=sample(10), V2= sample(10), V3= sample(10))

edited Jun 20 '20 at 09:12

Community

1
1

answered Jul 30 '15 at 10:04

akrun

874,273
37
540
662

How would you use the `for` loop for replacing a set of values with another set of values. For example, replacing `c(1, 2, 3)` with `c(4, 5, 6)`? That is, replacing 1 with 4, 2 with 5, and 3 with 6? – johnny Apr 23 '20 at 04:36
For replacing a set of values, you could use the approach given in [this answer](https://stackoverflow.com/a/16846530/3705612) or [this answer](https://stackoverflow.com/a/37720969/3705612 ). For instance, this would do it: `DT[, (names(DT)) := lapply(.SD, function(x) { fifelse(x %in% 1L:3L, yes = fcase(x == 1, 4, x == 2, 5, x == 3, 6), no = x)})]`. – Corey N. May 23 '23 at 00:32

R data.table multi column recode/sub-assign

1 Answers1

data

Linked

Related