6

I've come across this post on how to replace occurrences of a number in all columns of a data frame (e.g. replace all 4 by 10 in all columns): DF[DF == 4] <- 10. With data tables the same results can be achieved in exactly the same way: DT[DT == 4] <- 10.

However, how should I procede if I want to apply this modification but only to specific columns from the data table, whether these columns are specified by position (e.g. 2:4) or by name (e.g. c("V2", "V3", "V4"))?

I will favor an "elegant" solution rather than iterations over every column.

mat
  • 2,412
  • 5
  • 31
  • 69

1 Answers1

5

We can use set which would be more efficient

for(j in 2:4) {
  set(DT, i = which(DT[[j]]==4), j=j, value = 10)
 }
DT
#   V1 V2 V3 V4
#1:  A  2  2 10
#2:  B  1 10 10
#3:  C  3 10  3
#4:  D  3  2 10
#5:  E  3  3  3
#6:  F 10  3  3

The above can be done with column names as well

for(j in names(DT)[2:4]){
   set(DT, i = which(DT[[j]]==4), j=j, value = 10)
 }

Or another option is to specify the .SDcols with the columns of interest (either the numeric index or the column names), loop through the Subset of Data.table (.SD), replace the values that are 4 to 10 and assign (:=) the output back to columns of interest

DT[, (2:4) := lapply(.SD, function(x) replace(x, x==4, 10)), .SDcols = 2:4]

Or with column names

DT[, (names(DT)[2:4]) := lapply(.SD, function(x) replace(x, x==4, 10)), 
      .SDcols = names(DT)[2:4]]

data

set.seed(24)
DT <- data.table(V1 = LETTERS[1:6], V2 = sample(1:4, 6, replace = TRUE), 
   V3 = sample(2:4, 6, replace = TRUE), V4 = sample(3:4, 6, replace= TRUE))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 2
    This works perfectly! I would just suggest you to add one more line of code but using column names to make it clear for everyone. – mat Jun 04 '17 at 11:11