0

I'm wondering if there is a faster way to multiply a set of columns by values in a vector (essentially applying a regression formula). I'm seeing this post but it requires converting a data.table to a matrix which is inefficient.

I also see this old post that includes a for loop but the loop I tested is much, much slower than what I have here and given the age of the post I'm wondering if there are other ways.

Any other suggestions?

set.seed(100)
N = 2e7L
DT1 = data.table(id = sample(letters[1:10], N, TRUE),
                 a = sample(0:3, N, TRUE),
                 b = sample(0:3, N, TRUE),
                 c = sample(0:3, N, TRUE),
                 d = sample(0:3, N, TRUE))

v <- c(0.2232332, 0.332424, 0.3322525250, 0.32342323432)

# 528.4188 ms
microbenchmark({
  DT1[,blah:=a*v[1] + b*v[2]  + c*v[3] + d*v[4]]
}, times = 1, unit = "ms")
ZRoss
  • 1,437
  • 1
  • 15
  • 32
  • if this is calculating predictions from a model, did you compare the speed of the `predict` method? – arvi1000 Dec 06 '17 at 19:33
  • So what is wrong with you method? – David Arenburg Dec 06 '17 at 20:51
  • It's actually not a model so I can't use predict. There is nothing wrong with the method except that's it's slow. In this example it's 528 ms, but with my real data which is 10s of millions of records it takes a very long time. With a matrix you can use `%*%` and it seems quite efficient from a code perspective, I'm wondering if there would be a "data.table way" to do this here. – ZRoss Dec 06 '17 at 20:54
  • I don't think there could be possibly anything faster, rather than perhaps `set(DT1, j = "blah2", value = DT1[["a"]]*v[1] + DT1[["b"]]*v[2] + DT1[["c"]]*v[3] + DT1[["d"]]*v[4])` as you ain't doing anything data.tablish here rather just a simple vector multiplication. If you can just work with matrices, then it will be probably better indeed – David Arenburg Dec 06 '17 at 21:08

0 Answers0