7

I am attempting to use multicore function parallel with data.table and am unable to quite come up with the right way to do this. Code:

require(multicore)
require(data.table)
dtb = data.table(a=1:10, b=1:2)
x = dtb[,parallel(a+1),by=b]

> x
   b   pid fd
1: 1 12243  3
2: 1 12243  6
3: 2 12247  4
4: 2 12247  8

I would like to call collect() on this but these are no longer parallel objects. How should one do this?

Alex
  • 19,533
  • 37
  • 126
  • 195
  • What's wrong with `x = dtb[, collect(parallel(a+1)),by=b]`? – Ryogi Feb 04 '13 at 23:39
  • collect waits for parallel to finish – Alex Feb 05 '13 at 02:20
  • @alex it's difficult to understand why collect() waiting for parallel() is a problem. It's also difficult to extrapolate your example to a real problem, and therefore know what you really want from this. Can you help us understand? – ndoogan Mar 10 '13 at 15:44

1 Answers1

4

I think this is along the lines of what you want:

collect(dtb[, list(jobs = list(parallel(a+1))), by = b][, jobs])

The reason you didn't have parallel objects any more and couldn't run a collect is because you were converting them to a list, instead of storing them in a list, which is what I did above.

eddi
  • 49,088
  • 6
  • 104
  • 155