UPDATE: I actually tried Rcpp to see whether I could speed up that particular line in R but I decided not to pursue it because it took a long time when I tried for once and also saw Dirk's comment on the post How can I translate a function in R to Rcpp? which basically says that an R function run by C++ will not be sped up.
I have a script containing a function that creates word2vec embeddings with respect to the arguments provided to function, then implements an R code to do some statistical computations via embedding of rpy2 and uses the output of the R code.
def run_everything(*args):
#I create word embeddings, calculate cosine similarities here and write to csv
#I call an R function through rpy2 embedding and apply some computations to identify outliers, write them to .csv again
#Use the output of R function to do final operations and write another file to csv
Since I am trying a wide range of hyperparameters for my embeddings, I call this function from a different script and apply multiprocessing.Pool.
from build_model_and_run_r_code import run_everything
from multiprocessing import Pool
#I specify the args_for_pool as input for the function here
if __name__ == '__main__':
with Pool(8) as pool:
pool.starmap(run_everything, args_for_pool)
Two things could be better:
- Despite having 12 CPUs and only using 8 workers with multiprocessing.Pool, my computer significantly slows down while I run the code.
- If I provide 48 different combinations of hyperparameters, everything finishes in 3 hours. While this is already less than the half of time it would take to run each combination one by one, I was wondering whether I was neglecting some aspect that could make it even more efficient. The processes before/after R code is implemented are run fairly quickly, it is the computations in R that takes significant time. This is the specific line that takes most of the time
Specific line in R that takes time is
outly = compBagplot(dat_filt, "sprojdepth", options=list(maxiter = 500))