1

I have a data.table and want to take a linear combination of the columns. How should I do it?

The setup

require(data.table)
set.seed(1)

DT <- data.table(A = rnorm(10),
                 B = rnorm(10),
                 C = rnorm(10),
                 D = rnorm(10),
                 coefA = rnorm(10),
                 coefB = rnorm(10),
                 coefC = rnorm(10),
                 coefD = rnorm(10))

I can do the following:

DT[, sum := A*coefA + B * coefB + C * coefC + D * coefD]

Is there a better way to solve this?

morningfin
  • 329
  • 2
  • 10
  • Given your question, No there is no better way – statquant Jun 29 '17 at 18:29
  • In this instance, you're probably better off working with matrices. Here is one way in base R. ` myMat <- as.matrix(DT)` to convert to a matrix and then `rowSums(myMat[, 1:4] * myMat[, 5:8])` to compute the dot product. – lmo Jun 29 '17 at 18:34
  • Those are not linear combinations. In case anyone lands here based on the title, the proper ref is my question, I guess https://stackoverflow.com/questions/19279075/efficiently-computing-a-linear-combination-of-data-table-columns – Frank Jun 29 '17 at 22:50

3 Answers3

3

One option is

DT[ sum := Reduce(`+`, DT[, 1:4] * DT[, 5:8])]

Or using .SD

DT[, sum := Reduce(`+`, .SD[, 1:4] * .SD[, 5:8])]

Or we can do

nm1 <- names(DT)[1:4]
nm2 <- paste0("coef", nm1)
DT[, sum := Reduce(`+`, Map(`*`, mget(nm1), mget(nm2)))]
akrun
  • 874,273
  • 37
  • 540
  • 662
0

With dplyr:

DT %>% mutate(sum = A*coefA + B * coefB + C * coefC + D * coefD)
Dan
  • 11,370
  • 4
  • 43
  • 68
0

Assuming you're needing a better method because you may not always have 4 of each, the following will work as long as the ordering is correct for adding E,F,G;coefE,coefF,coefG...

coefcols <- names(DT)[grepl("coef", names(DT))]
valucols <- names(DT)[!grepl("coef", names(DT))]
DT[, sum := apply(DT[, ..valucols] * DT[, ..coefcols], 1, sum)]

Edit: After reading @lmo's comment, I realized that the last line can be simplified using rowSums:

DT[, sum := rowSums(DT[, ..valucols] * DT[, ..coefcols])]
Eric Watt
  • 3,180
  • 9
  • 21