From ?setDTthreads
:
Internally parallelized code is used in the following places:
between.c
- between()
cj.c
- CJ()
coalesce.c
- fcoalesce()
fifelse.c
- fifelse()
fread.c
- fread()
forder.c
, fsort.c
, and reorder.c
- forder()
and related
froll.c
, frolladaptive.c
, and frollR.c
- froll()
and family
fwrite.c
- fwrite()
gsumm.c
- GForce in various places, see GForce
nafill.c
- nafill()
subset.c
- Used in [.data.table
subsetting
types.c
- Internal testing usage
My understanding is that you should not expect data.table
to make use of multithreading outside of the above use cases. Note that [.data.table
uses multithreading for subsetting only, i.e., in i
-expressions but not j
-expressions. That is presumably just to speed up relational and logical operations, as in x[!is.na(a) & a > 0]
.
In a j
-expression, sum
and sapply
are still just base::sum
and base::sapply
. You can test this with a benchmark:
library("data.table")
setDTthreads(4L)
x <- data.table(a = rnorm(2^25))
microbenchmark::microbenchmark(sum(x$a), x[, sum(a)], times = 1000L)
Unit: milliseconds
expr min lq mean median uq max neval
sum(x$a) 51.61281 51.68317 51.95975 51.84204 52.09202 56.67213 1000
x[, sum(a)] 51.78759 51.89054 52.18827 52.07291 52.33486 61.11378 1000
x <- data.table(a = seq_len(1e+04L))
microbenchmark::microbenchmark(sapply(x$a, paste, "is a good number"), x[, sapply(a, paste, "is a good number")], times = 1000L)
Unit: milliseconds
expr min lq mean median uq max neval
sapply(x$a, paste, "is a good number") 14.07403 15.7293 16.72879 16.31326 17.49072 45.62300 1000
x[, sapply(a, paste, "is a good number")] 14.56324 15.9375 17.03164 16.48971 17.69045 45.99823 1000
where it is clear that simply putting code into a j
-expression does not improve performance.
data.table
does recognize and handle certain constructs exceptionally. For instance, data.table
uses its own radix-based forder
instead of base::order
when it sees x[order(...)]
. (This feature is somewhat redundant now that users of base::order
can request data.table
's radix sort by passing method = "radix"
.) I haven't seen a "master list" of such exceptions.
As for whether using, e.g., parallel::mclapply
inside of a j
-expression can have performance benefits, I think the answer (as usual) depends on what you are trying to do and the scale of your data. Ultimately, you'll have to do your own benchmarks and profiling to find out. For example:
library("parallel")
cl <- makePSOCKcluster(4L)
microbenchmark::microbenchmark(x[, sapply(a, paste, "is a good number")], x[, parSapply(cl, a, paste, "is a good number")], times = 1000L)
stopCluster(cl)
Unit: milliseconds
expr min lq mean median uq max neval
x[, sapply(a, paste, "is a good number")] 14.553934 15.982681 17.105667 16.585525 17.864623 48.81276 1000
x[, parSapply(cl, a, paste, "is a good number")] 7.675487 8.426607 9.022947 8.802454 9.334532 25.67957 1000
So it is possible to see speed-up, though sometimes you pay the price in memory usage. For small enough problems, the overhead associated with R-level parallelism can definitely outweigh the performance benefits.
You'll find good thread about integrating parallel
and data.table
(including reasons not to) here.