-1

I want to use foreach package to parallel the for loop:

the original code looks like:

data_df=data.frame(...) # the data frame where original data stored
result_df=data.frame(...) # the data frame where result data to be stored

for(i in 1:10)
{
     a=data_df[i,]$a
     b=data_df[i,]$b
     sum_result=a+b
     sub_result=a-b
     result_df[i,]$sum_result=sum_result
     result_df[i,]$sub_result=sub_result
}

I used index i as the row number, to get data from data frame and store data back to another data frame.

However, if I change:

for(i in 1:10)

to

foreach( i=1:10) %dopar% 

It does run super fast, but the result seems only stored in one column in the data frame. How can I save two columns together?

How should I write the shared data frame, in order to be paralleled?

sample data for data_df

a   b
1   1
2   4
4   8
9   6
2   3
lserlohn
  • 5,878
  • 10
  • 34
  • 52
  • in parallelisation, each child process gets a new environment. So at the end you need to return the data.frame so that each child process output can be stored together by the parent process – joel.wilson Nov 11 '16 at 08:46
  • also add a sample data example for us to work on! – joel.wilson Nov 11 '16 at 08:47
  • thanks for pointing out, I have added that – lserlohn Nov 11 '16 at 08:55
  • @lserlohn Does my answer your question? If not, Please add [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Prradep Nov 11 '16 at 09:07
  • Thanks all. I found my problem is more complicated. How to save two columns together? I can only save one, if I use rbind option. – lserlohn Nov 11 '16 at 17:02

2 Answers2

1

you should use .combine = rbind

result = foreach(i = 1:5, .combine = rbind) %dopar% {
  data.frame(x = runif(40), i = i)
}

> head(result)
          x i
1 0.2777559 1
2 0.2126995 1
3 0.2847905 1
4 0.8950941 1
5 0.4462353 1
6 0.7799849 1
Prradep
  • 5,506
  • 5
  • 43
  • 84
  • I revised my question, could you help to answer how to return a data frame with sequential assignment of values – lserlohn Nov 11 '16 at 18:28
1

You could do this:

require("doParallel")
require("foreach")
registerDoParallel(cores=detectCores())
n <- nrow(data_df)
res <- foreach(i=1:n, .combine=rbind) %dopar% {
    data_df[i,]$a + data_df[i,]$b
}

data_df

  # a  b
# 1 1  6
# 2 2  7
# 3 3  8
# 4 4  9
# 5 5 10

res
         # [,1]
# result.1    7
# result.2    9
# result.3   11
# result.4   13
# result.5   15

data

data_df <- structure(list(a = 1:5, b = 6:10), .Names = c("a", "b"), row.names = c(NA, 
-5L), class = "data.frame")
989
  • 12,579
  • 5
  • 31
  • 53
  • Thanks, after carefully examining the code, I found my problem is how to output two columns result in a data frame. Could you please take a look at the new code? Thanks. – lserlohn Nov 11 '16 at 16:58