2

I am trying to use doParallel to apply a function to each row of a data.frame instead of using apply. I posted a question about this earlier here:

doParallel instead of apply

The answer posted to my earlier question worked for the example I provided at that time but does not work with the example below. In my current example I create a trivial function that for each row of one data.frame simply renames columns of a second data.frame. If I do not rename the columns of the second data.frame in the function then the answer posted to my earlier question works.

Here is the first data.frame:

df1 <- read.table(text = '
    aa bb cc dd  ee
     1  7  8  9  10
     2 70 80 90 100
', header = TRUE, stringsAsFactors = FALSE)

Here is the second data.frame:

df2 <- read.table(text = '
   SS TT YY UU II
    4  5  6 CH  7
   44 55 66 CH 77
', header = TRUE)

Here is my function:

my.function1 <- function(aa, bb, cc, dd, ee) {
  colnames(df2) <- c('qqq', 'www', 'eee', 'rrr', 'ttt')
  return = list(new.df=df2)
}

The apply statement works:

function.output <- apply(df1, 1, function(x) {my.function1(x[1], x[2], x[3], x[4], x[5])})
function.output
#[[1]]
#[[1]]$new.df
#  qqq www eee rrr ttt
#1   4   5   6  CH   7
#2  44  55  66  CH  77
#
#
#[[2]]
#[[2]]$new.df
#  qqq www eee rrr ttt
#1   4   5   6  CH   7
#2  44  55  66  CH  77

The doParallel statement does not work:

library(doParallel)
dat1 <- df1
ncores1 <- detectCores()-5
c1 <- parallel::makeCluster(ncores1)
registerDoParallel(c1)

v1 <- foreach(i = 1:nrow(dat1)) %dopar% {
     my.function1(dat1[i,1], dat1[i,2], dat1[i,3], dat1[i,4], dat1[i,5])
}
#Error in { : task 1 failed - "object 'df2' not found"
v1
stopCluster(c1)
#Error: object 'v1' not found

If I modify the function so it does not rename the columns of the second data.frame then the doParallel statement does work:

my.function2 <- function(aa, bb, cc, dd, ee) {
  return = list(new.df=df2)
}

dat2 <- df1
ncores2 <- detectCores()-5
c2 <- parallel::makeCluster(ncores2)
registerDoParallel(c2)

v2 <- foreach(i = 1:nrow(dat2)) %dopar% {
     my.function2(dat2[i,1], dat2[i,2], dat2[i,3], dat2[i,4], dat2[i,5])
}
v2
#[[1]]
#[[1]]$new.df
#  SS TT YY UU II
#1  4  5  6 CH  7
#2 44 55 66 CH 77
#
#
#[[2]]
#[[2]]$new.df
#  SS TT YY UU II
#1  4  5  6 CH  7
#2 44 55 66 CH 77
stopCluster(c2)

How can I get the doParallel statement to work while renaming the columns of the second data.frame in the function?

Mark Miller
  • 12,483
  • 23
  • 78
  • 132

1 Answers1

2

Use the .export

v1 <- foreach(i = 1:nrow(dat1), .export = "df2") %dopar% {
     my.function1(dat1[i,1], dat1[i,2], dat1[i,3], dat1[i,4], dat1[i,5])
}

-output

> v1
[[1]]
[[1]]$new.df
  qqq www eee rrr ttt
1   4   5   6  CH   7
2  44  55  66  CH  77


[[2]]
[[2]]$new.df
  qqq www eee rrr ttt
1   4   5   6  CH   7
2  44  55  66  CH  77
akrun
  • 874,273
  • 37
  • 540
  • 662
  • If I have three data.frames and want to rename the columns of two of them in the function can I use something like `.export = c("df2", "df3")` or `.export = list("df2", "df3")`? – Mark Miller Nov 11 '21 at 16:56
  • 1
    @MarkMiller it should be `c("df2", "df3")` – akrun Nov 11 '21 at 16:56