40

I have a large data.table in R with several columns with dollar values. In a different column I have an inflation adjustment number. I am trying to figure out how to update each of my monetary columns with it multiplied by the inflation adjustment column. Suppose I have the data:

   DT <- data.table(id=1:1000,year=round(runif(1000)*10), 
          inc1 = runif(1000), inc2 = runif(1000), inc3 = runif(1000),    
          deflator = rnorm(1000))

which gives output:

             id year      inc1      inc2       inc3    deflator
   1:    1    8 0.4754808 0.6678110 0.41533976 -0.64126988
   2:    2    2 0.6568746 0.7765634 0.70616373  0.39687915
   3:    3    6 0.8192947 0.9236281 0.90002534 -0.69545700
   4:    4    4 0.7781929 0.1624902 0.17565790  0.05263055
   5:    5    7 0.6232520 0.8024975 0.86449836  0.70781887
  ---                                                     
 996:  996    2 0.9676383 0.2238746 0.19822000  0.78564836
 997:  997    9 0.9877410 0.5783748 0.57497438 -1.63365223
 998:  998    8 0.2220570 0.6500632 0.19814932  1.00260174
 999:  999    3 0.4793767 0.2830457 0.54835581  1.04168818
1000: 1000    8 0.2003476 0.6121637 0.02921505  0.34933690

in reality I have inc1 - inc100, rather than just three variables and I want to figure out a way to perform this action:

DT[, inc1 := inc1 * deflator]

for each of my 100 income columns (inc1, inc2, inc3 in the fake data above). I will have more than 100 columns in the future, so I would like to figure out a way to loop the action over the columns. Is there a way to do this for all the income columns at once?

I would like to do something like:

inc_cols = c(inc1, inc2, inc3)

DT[, inc_cols := lapply(inc_cols,function(x)= x * deflator),]

or

DT[, inc_cols := lapply(.SD,function(x)= x * deflator),.SDcols = inc_cols]

but neither of these seem to work. I also tried using the get() function to make it clear deflator is a referencing a column, like:

DT[, inc_cols := lapply(.SD,function(x)= x * get(deflator)),.SDcols = inc_cols]

but had no luck. I also tried to loop through the variables with something like:

for (var in inc_cols) {
  print(var)
  DT[, get(var) := get(var) *infAdj2010_mult] 
}

which returns

[1] "inc1"
 Error in get(var) : object 'inc1' not found 

I realize this is probably a straight forward question and I have tried to search the other questions here and various online guides and tutorials, but I cannot find an example matching my specific problem. It is similar to this question, but not exactly.

Thanks for your help!

talat
  • 68,970
  • 21
  • 126
  • 157
johneric
  • 955
  • 2
  • 8
  • 11
  • 2
    FYI to anyone viewing this now, I guess `DT[, inc_cols := lapply(.SD,function(x)= x * deflator),.SDcols = inc_cols]` didn't work at the time of this question, but it works in the current version of `data.table` – IceCreamToucan Jul 16 '18 at 13:48

4 Answers4

30

Since you can use dplyr on data.tables, you could also do:

library(dplyr)
DT %>% mutate_each(funs(.*deflator), starts_with("inc"))

Which will multiply each column of DT that starts with "inc" by the "deflator" column.

talat
  • 68,970
  • 21
  • 126
  • 157
  • This also worked, thank you. I went ahead and awarded the first post the check since it solved the problem along the lines I was already taking, but I also appreciate your help. – johneric Jan 24 '15 at 23:49
  • 5
    mutate_each() is now deprecated in favor of mutate_at(). The code could now be written as `DT %>% mutate_at(vars(starts_with("inc")), ~.*deflator)`. – Jean V. Adams Mar 14 '20 at 18:05
  • 1
    Now with dplyr 1.0 something like this is *de rigeur*: `DT %>% mutate(across(starts_with("inc"),~.*deflator)` – Mark Neal Jun 14 '20 at 09:47
  • Maybe it needs a second closing bracket before the comma? – Mark Neal Oct 09 '20 at 04:15
29

You could try

DT[, (inc_cols) := lapply(.SD, function(x) 
        x * DT[['deflator']] ), .SDcols = inc_cols]
head(DT1,2)
#   id year         inc1         inc2       inc3   deflator
#1:  1    3  0.614838304  0.009796974  0.3236051  0.7735552
#2:  2    2 -0.001583579 -0.082289606 -0.1365115 -0.6644330

Or if you need a loop

for(inc in inc_cols){
  nm1 <- as.symbol(inc)
  DT[,(inc):= eval(nm1)*deflator]
}

 head(DT,2)
 #  id year         inc1         inc2       inc3   deflator
 #1:  1    3  0.614838304  0.009796974  0.3236051  0.7735552
 #2:  2    2 -0.001583579 -0.082289606 -0.1365115 -0.6644330

Or a possible option using set which should be very fast as the overhead of [.data.table is avoided (suggested by @Arun)

indx <- grep('inc', colnames(DT))

for(j in indx){
 set(DT, i=NULL, j=j, value=DT[[j]]*DT[['deflator']])
}
head(DT,2)
#  id year         inc1         inc2       inc3   deflator
#1:  1    3  0.614838304  0.009796974  0.3236051  0.7735552
#2:  2    2 -0.001583579 -0.082289606 -0.1365115 -0.6644330

where

inc_cols <-  grep('^inc', colnames(DT), value=TRUE)

data

set.seed(24)
DT <- data.table(id=1:1000,year=round(runif(1000)*10), 
      inc1 = runif(1000), inc2 = runif(1000), inc3 = runif(1000),    
      deflator = rnorm(1000)) 
Arun
  • 116,683
  • 26
  • 284
  • 387
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    for-loop would be great with `set()`, as it avoids the `[.data.table` overhead. – Arun Jan 24 '15 at 09:06
  • 3
    @Arun Thanks, Not sure if this is what you meant. `for(i in 3:5){ set(DT, i=NULL, j=i, value=DT[[i]]*DT[['deflator']]) }` – akrun Jan 24 '15 at 12:03
  • 2
    Exactly what I meant! Perfect. – Arun Jan 24 '15 at 14:10
  • Thanks, this worked great! data.table is amazing, but the syntax is tricky, I appreciate the help. – johneric Jan 24 '15 at 23:48
  • Is this possible if you also want to group by another variable? – Jeff Groh Jan 27 '22 at 23:34
  • @JeffGroh in the first option, you can add `by` i.e `DT[, (inc_cols) := lapply(.SD, function(x) x * DT[['deflator']] ), by = yourcol, .SDcols = inc_cols]` – akrun Jan 28 '22 at 16:09
2

This approach is also quite convenient, but likely slower than using set():

library(data.table); library(magrittr)
set.seed(42)
DT <- data.table(id=1:1000,year=round(runif(1000)*10),
          inc1 = runif(1000), inc2 = runif(1000), inc3 = runif(1000),
          deflator = rnorm(1000))
vars <- names(DT) %>% .[grepl("inc", .)]
DT[, (vars) := .SD * deflator, .SDcols = vars]
DT[]

        id year         inc1        inc2        inc3   deflator
   1:    1    9  0.212563676  0.24806366  0.06860638  0.2505781
   2:    2    9 -0.017438715 -0.12186792 -0.26241497 -0.2779240
   3:    3    3 -1.414016119 -1.20714809 -0.76920337 -1.7247357
   4:    4    8 -1.082336969 -1.78411512 -1.08720698 -2.0067049
   5:    5    6 -0.644638321 -1.07757416 -0.20895576 -1.2918083
  ---                                                          
 996:  996    1 -0.573551720 -1.93996157 -0.50171303 -2.1569621
 997:  997    5 -0.007899417 -0.01561619 -0.05708009 -0.0920275
 998:  998    1 -0.090975121 -0.30475714 -0.27291825 -0.3974001
 999:  999    5 -0.045984079 -0.01563942 -0.07868934 -0.1383273
1000: 1000    0 -0.785962308 -0.63266975 -0.29247974 -0.8257650
chandler
  • 716
  • 8
  • 15
0

@IceCreamToucan noted that the following should work in newer versions of DT.

DT[, inc_cols := lapply(.SD,function(x)= x * deflator),.SDcols = inc_cols] 

For me, I had to do the following to LHS to make it work. Also, see RHS edit.

DT[, c(inc_cols) := lapply(.SD, function(x) x * deflator), .SDcols = inc_cols] 

-or-

DT[, (inc_cols) := lapply(.SD, function(x) x * deflator), .SDcols = inc_cols] 
AcK
  • 2,063
  • 2
  • 20
  • 27