I'm running the foreach package to try to parallelize my windows function (this was the only version of parallelizing I could follow easily). I basically need to call a function for g=1, then g=2, etc., and wanted to do this faster.
- My function works perfectly fine with a regular for loop or with %do% instead of %dopar%
- I believe I am passing all of the packages I am using and hopefully the correct variables/objects
- but I have very little understanding of parallelizing & nodes, and the errors don't give me enough to troubleshoot on
- I only included my main function, not all of the other functions it calls, but can provide
- would appreciate any help on this issue, parallelizing in Windows, and what kinds of things I have to keep in mind to make sure my %do% code works across the nodes from %dopar%
Thank you very much for your help!!
My code:
#agonize parallel
#main function
par_agonize <- function(datfile, num_groups, regen_pref_matrices = FALSE, graph_groups = num_groups) {
if (regen_pref_matrices) mm <- gen_pref_matrices(datfile)
out <- list()
tic.clearlog()
improve <- tibble(groups=numeric(), agony=numeric(), abs_dec=numeric(), percent_dec=numeric(), total_dec=numeric(), tot_per_dec=numeric())
foreach(g = 1:num_groups, .packages = loaded.package.names, .export = c(loaded.functions, loaded.objects), .verbose = TRUE) %dopar% { #key line where I use dopar/foreach
tic()
out[[g]] <- find_groups(mm, g) #this is the critical line, the improve and tic/toc log are just accessories
toc(log = TRUE, quiet = FALSE) #calculates time
log.lst <- tic.log(format = FALSE)
if (g == 1) { #this calculates summary statistics, not important
improve <- add_row(improve, groups = g, agony = out[[g]]$ag, abs_dec = 0, percent_dec = 0, total_dec = 0, tot_per_dec=0)
}
else {
improve <- add_row(improve, groups = g, agony = out[[g]]$ag, abs_dec = out[[g]]$ag - out[[g-1]]$ag,
percent_dec = (out[[g]]$ag - out[[g-1]]$ag)/(out[[g-1]]$ag), total_dec = out[[g]]$ag - out[[1]]$ag,
tot_per_dec = (out[[g]]$ag - out[[1]]$ag)/(out[[1]]$ag))
}
}
#just saves output to my list
out[["summary_stats"]] <- improve
out[["timings"]] <- tibble(num_groups = 1:g, run_time = unlist(lapply(log.lst, function(x) x$toc - x$tic))) %>%
add_row("num_groups" = "Total", "run_time" = sum(out[["timings"]]$run_time[1:g]))
out[["agony_graph"]] <- graph_agony(out, graph_groups)
social_rank <<- out
return(social_rank$agony_graph)
}
#test code
registerDoParallel(cores = detectCores() - 1)
loaded.package.names <- c(sessionInfo()$basePkgs, names(sessionInfo()$otherPkgs))
loaded.package.names #works
loaded.functions <- c("assign_groups", "find_agony", "find_groups", "generate_hierarchy", "gen_pref_matrices", "graph_agony", "init")
loaded.objects <- c("mm") #I can regenerate mm within my code... or use the mm that's already there, so I figured I would export it him
system.time(par_agonize("./data/hof17.csv", 2, regen=F)) #this is the MAIN line that runs my function
stopCluster(cl) #not clear if needed
My current error is:
automatically exporting the following variables from the local
environment:
improve, out
explicitly exporting variables(s): assign_groups, find_agony, find_groups,
generate_hierarchy, gen_pref_matrices, graph_agony, init, mm
numValues: 2, numResults: 0, stopped: TRUE
got results for task 1
numValues: 2, numResults: 1, stopped: TRUE
returning status FALSE
got results for task 2
accumulate got an error result
numValues: 2, numResults: 2, stopped: TRUE
calling combine function
evaluating call object to combine results:
fun(accum, result.1)
returning status TRUE
Show Traceback
Rerun with Debug
Error in { : task 2 failed - "replacement has length zero"