How to apply function over subset of columns in data.table while grouping by some other column?

Question

As an example take this data.table:

foo <- data.table(id = letters[1:5], group = c('a', 'a', 'a', 'b', 'b'), x=1:5, y = (-4):0, z = 2:6)

   id group x  y z
1:  a     a 1 -4 2
2:  b     a 2 -3 3
3:  c     a 3 -2 4
4:  d     b 4 -1 5
5:  e     b 5  0 6

I want to normalize the column vectors x, y, and z (x/sum(x)) by groups, i.e. by those groups defined by column group. I also want to preserve all other remaining columns.

I am trying something along these lines:

foo[, lapply(.SD[, -1], function(x) {x/sum(x)}), by = group]

   group         x         y         z
1:     a 0.1666667 0.4444444 0.2222222
2:     a 0.3333333 0.3333333 0.3333333
3:     a 0.5000000 0.2222222 0.4444444
4:     b 0.4444444 1.0000000 0.4545455
5:     b 0.5555556 0.0000000 0.5454545

but column id is dropped because of .SD[, -1], however I do not know how to apply over the numeric columns only without dropping it...

akrun · Accepted Answer · 2018-06-16T16:03:29.797

We could specify the .SDcols and assign the output back to the same columns.

foo[, names(foo)[3:5]  := lapply(.SD, function(x) x/sum(x)),
                 by = group, .SDcols = x:z]

Note that the type should be the same for the output and the input. If the input is integer and output is numeric it would have problem. So, change the class to numeric first and then do the assignment

nm1 <- names(foo)[3:5]
#or programmatically based on checking whether column is numeric
#nm1 <- foo[, which(unlist(lapply(.SD, is.numeric)))]
foo[, (nm1) := lapply(.SD, as.numeric), .SDcols = nm1
      ][, (nm1) := lapply(.SD, function(x) x/sum(x)), 
                by = group, .SDcols = nm1][]

A tidyverse approach to the above would be

library(dplyr)
foo %>% 
     group_by(group) %>%
     mutate_if(is.numeric, funs(./sum(.)))

How to apply function over subset of columns in data.table while grouping by some other column?

1 Answers1