3

I'm wondering if there's a way to process knitr code chunks in an RMarkdown document asynchronously.

Here's what I have in mind: imagine a document containing a complex data analysis broken down into several code chunks, several of which contain code that is complex and therefore slow to run. These code chunks don't have any dependency on each other, and their output is either a plot, a table, or some other numerical result, but not any data object that is used by any other code chunk.

It would be great if I could parallelise the processing of these code chunks. Typically, knitr processes each code chunk sequentially, consequently, it might be that there are several code chunks that are held-up in a queue behind a code chunk that is slow to process. R packages like future and promises enable asynchronous programming, and I was wondering if perhaps this could be leveraged to process knitr code chunks in parallel in the same way. I'm aware that I can likely put the slow code chunks in separate Rmd files and then in a code chunk call knitr::knit_child in a future::future_map call, but it would be nicer to keep everything together in the same file. I'm also aware that is possible to specify child documents using the child option in a code chunk. Moreover, I'm aware that I can reuse code chunks by calling them by name using the ref.label option. So what I'm wondering is whether there's any way to hijack any of this functionality (probably using future) so that there is delayed rendering of code chunk output while execution passes to subsequent code chunks. Or something like that. Just exploring the space of possibilities for moving beyond sequential computation of code chunks in knitr and using multicore or multisession processing.

  • 1
    It is an interesting idea. Not sure SO is the best place to ask for this though. I believe a solution for this would use {drake}. Maybe you should contact the authors of {knitr}, {future} and {drake} and tell them about your idea. – F. Privé Nov 05 '19 at 06:00
  • The sequential process of code chunks ensures the reproducibility of a Rmarkdown document. And as far as I know, R is single-threaded by nature, so... – TC Zhang Nov 05 '19 at 07:38
  • * knitr Github page explicitly states I should post to SO and wait 24 hours before posting a question as an issue to Github * I'll investigate drake; thanks! * Yes, sequential processing ensures reproducibility, but if the chunks are independent of each other and have no downstream effects, there's no reason in principle why they cannot be processed asynchronously, if done carefully * R is single-threaded, but packages like future and promises enable multicore and multisession processing, so this is not a blocker * I'm British -- please don't edit my post to American English, thanks! – Dr. Andrew John Lowe Nov 05 '19 at 16:40
  • 1
    I don't think this is possible (without **a lot** of hacking, at least). Basically, knitr evaluates a chunk (or the repective cache content) and then prints the result. If chunks were evaluated asynchronously, all code would have to be evaluated first before any output is composed. That's structurally very different, IMO. A hack-y solution could `purl` the current document (with all related caveats!), split the result by chunk, evaluate their contents using your parallelization tool of choice and finally (somehow) save the outputs of the parallelized evaluation in a way that knitr [cont.] – CL. Nov 09 '19 at 14:33
  • 1
    ... [cont.] recognizes it as cache. All this should happen in the first chunk. Then evaluation could continue regularly, using the previously generated cache. But this has so many drawbacks and preconditions that I would not recommend trying it. Good luck! – CL. Nov 09 '19 at 14:36
  • 1
    I've [implemented](https://github.com/rubenarslan/codebook/blob/master/R/codebook.R#L139) the variant with `future` and `knit_child` in the `codebook` package using the [`rmdpartials`](https://cran.r-project.org/web/packages/rmdpartials/index.html) package. I looked into parallel processing of chunks without this explicit parallelisation (which of course leads to less readable code), and didn't find a way. – Ruben Nov 23 '21 at 08:12

0 Answers0