1

I'm trying to improve the runtime of an R packet which utilizes the by function by(data, INDICES, FUN, ..., simplify = TRUE) frequently. Does a parallelisable version of this function exist?

According to its documentation, by is a wrapper for tapply, which in turn could be replaced by sapply and split. In case no parallelisable version of by exists, do I assume correctly that unwrapping the function towards its parallelisable core function would be the way to go?

  • We need more specifics: what libraries, what is the task, what is the runtime problem, etc. Right now, this is too broad. Generally, any iterative process can be parallelized. – Parfait Feb 25 '19 at 15:50
  • We need https://stackoverflow.com/help/mcve and https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610. – Krantz Feb 26 '19 at 21:12
  • FWIW, it's on the to-do list to add a `future_by()` to the [future.apply](https://cran.r-project.org/web/packages/future.apply) package - see https://github.com/HenrikBengtsson/future.apply/issues/35 – HenrikB Feb 27 '19 at 14:25
  • UPDATE 2019-03-01: There is now a `future_by()` in the develop branch of [future.apply](https://github.com/HenrikBengtsson/future.apply/tree/develop). Please see https://github.com/HenrikBengtsson/future.apply/issues/35#issuecomment-468809259 for installation instructions. – HenrikB Mar 01 '19 at 21:17

0 Answers0