I want to have each python worker start an R shell using rpy2. Can I do this during some sort of setup phase similar to how I assume this would happen when you import a Python module to be used for later executor tasks? For example:
import numpy as np
df.mapPartitions(lambda x: np.zeros(x))
In my case I want to instead start an R shell on each executor and import R libraries, which would look something like this:
import rpy2.robjects as robjects
from rpy2.robjects.packages import importr
rlibrary = importr('testrlibrary')
df.mapPartitions(lambda x: rlibrary.rfunc(x))
But I don't want this to occur inside the call to mapPartitions
, because then it would happen at the task-level as opposed to once per executor core. That approach works and looks more like the example below but is not useful for me.
def model(partition):
import rpy2.robjects as robjects
from rpy2.robjects.packages import importr
rlibrary = importr('testrlibrary')
rlibrary.rfunc(partition)
df.mapPartitions(model)