0

Here's a sample input data, a simplified function.

require(data.table)
sampleDT <- data.table(c1 = c(1,2,3), c2 = c(4,5,6))
print(sampleDT)
   c1 c2
1:  1  4
2:  2  5
3:  3  6
testF <- function(x = NULL, y = NULL) {
  return(list(x+y,x))
}

resultCol <- c("r1","r2")
sampleDT[, (resultCol) := testF(c1,c2), by = seq(nrow(sampleDT))]
print(sampleDT)
   c1 c2 r1 r2
1:  1  4  5  1
2:  2  5  7  2
3:  3  6  9  3

The actual function can't be vectorized easily, and it returns a 1*n list.

I'm looking for a parallel solution for this by-row & row-wise operation. Also if there're multiple ways of constructing a parallel process, which one is the speed optimized?

Please leave some sample codes, because I'm not familiar with the syntax (e.g. foreach, mclapply, etc.)

LeGeniusII
  • 900
  • 1
  • 10
  • 28
  • In this case, `by` is not needed `sampleDT[, (resultCol) := testF(c1, c2)]` – akrun Feb 14 '19 at 05:10
  • @akrun I have to do by row, because the function `testF` is not vectorized. – LeGeniusII Feb 14 '19 at 05:15
  • 1
    If it take multiple arguments, then use `Map` i.e. `sampleDT[, (resultCol) := Map(testF, c1, c2)]` – akrun Feb 14 '19 at 05:18
  • note when that if you use the `Map` function suggestion by @akrun, you might have to bind your result using rbindlist, i.e. `sampleDT[, (resultCol) := rbindlist(Map(testF, c1, c2))]`, to force data.table to interpret the list correctly. – Oliver Feb 14 '19 at 08:14

0 Answers0