I am trying to use doParallel
to apply a function
to each row of a data.frame
instead of using apply
. I posted a question about this earlier here:
The answer posted to my earlier question worked for the example I provided at that time but does not work with the example below. In my current example I create a trivial function
that for each row of one data.frame
simply renames columns
of a second data.frame
. If I do not rename the columns
of the second data.frame
in the function
then the answer posted to my earlier question works.
Here is the first data.frame
:
df1 <- read.table(text = '
aa bb cc dd ee
1 7 8 9 10
2 70 80 90 100
', header = TRUE, stringsAsFactors = FALSE)
Here is the second data.frame
:
df2 <- read.table(text = '
SS TT YY UU II
4 5 6 CH 7
44 55 66 CH 77
', header = TRUE)
Here is my function:
my.function1 <- function(aa, bb, cc, dd, ee) {
colnames(df2) <- c('qqq', 'www', 'eee', 'rrr', 'ttt')
return = list(new.df=df2)
}
The apply
statement works:
function.output <- apply(df1, 1, function(x) {my.function1(x[1], x[2], x[3], x[4], x[5])})
function.output
#[[1]]
#[[1]]$new.df
# qqq www eee rrr ttt
#1 4 5 6 CH 7
#2 44 55 66 CH 77
#
#
#[[2]]
#[[2]]$new.df
# qqq www eee rrr ttt
#1 4 5 6 CH 7
#2 44 55 66 CH 77
The doParallel
statement does not work:
library(doParallel)
dat1 <- df1
ncores1 <- detectCores()-5
c1 <- parallel::makeCluster(ncores1)
registerDoParallel(c1)
v1 <- foreach(i = 1:nrow(dat1)) %dopar% {
my.function1(dat1[i,1], dat1[i,2], dat1[i,3], dat1[i,4], dat1[i,5])
}
#Error in { : task 1 failed - "object 'df2' not found"
v1
stopCluster(c1)
#Error: object 'v1' not found
If I modify the function
so it does not rename the columns
of the second data.frame
then the doParallel
statement does work:
my.function2 <- function(aa, bb, cc, dd, ee) {
return = list(new.df=df2)
}
dat2 <- df1
ncores2 <- detectCores()-5
c2 <- parallel::makeCluster(ncores2)
registerDoParallel(c2)
v2 <- foreach(i = 1:nrow(dat2)) %dopar% {
my.function2(dat2[i,1], dat2[i,2], dat2[i,3], dat2[i,4], dat2[i,5])
}
v2
#[[1]]
#[[1]]$new.df
# SS TT YY UU II
#1 4 5 6 CH 7
#2 44 55 66 CH 77
#
#
#[[2]]
#[[2]]$new.df
# SS TT YY UU II
#1 4 5 6 CH 7
#2 44 55 66 CH 77
stopCluster(c2)
How can I get the doParallel
statement to work while renaming the columns
of the second data.frame
in the function?