-2

I have a complexe loop for in my script witch take too much time to the end (more than 1 hour). If it's possible, i would like to use more than 1 core of my CPU to decrease calcul time.

Is it possible to use paralellization to use my loop ?

général data fram

TabR1

Vector with code for selec each station

vecsandre

Function

copy <- function (m) {
  for (i in 1:m) {
    TEST[[i]] <- TabR1[TabR1$CdStationMesureEauxSurface == vecsandre[i],]
  }
}

List to get selections

TEST=list()

library(doParallel)
no_cores <- detectCores() - 1
grappe <- makeCluster(no_cores)
registerDoParallel(no_cores)
system.time(foreach(z=1:10) %dopar% copy(z))
stopCluster(grappe)

I try this but i get an error :

Error in copy(z) : task 1 failed - "objet 'TEST' introuvable"

Jaap
  • 81,064
  • 34
  • 182
  • 193
xavier363
  • 11
  • 3
  • 1
    Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610). This will make it much easier for others to help you. – Jaap Feb 20 '18 at 15:28

2 Answers2

0

In R, foreach loops work differently from for loops. Consider this:

> x = for (i in 1:3) {i^2}
> x
NULL
> y = foreach(i = 1:3) %do% {i^2}
> y
[[1]]
[1] 1

[[2]]
[1] 4

[[3]]
[1] 9

The for loop does not return anything. It just repeats the expression within {} a given number of times. It is up to you to capture the results of these iterations.

The foreach loop is more like a function. Each iteration of the loop returns the result of the expression in {} (or the last expression if there were more than one). Then the default behavior of foreach is to combine results from each iteration into a list and return that list. You cannot assign intermediate results from foreach iterations into another object, because foreach exists to run each iteration in a (potentially) different process. This behavior lets it avoid issues with concurrent access to the results container, for does not have this issue because it is sequential.

kgolyaev
  • 565
  • 2
  • 10
  • If i understand well, when i use a Foreach loop, it is not possible to change the initial data frame because we will have multi access to this unique df...(parallel). But when you use a list() it's nearly the same, when each process end, it write the result in.... There are no way to modify a unique dataframe ? Thanks for the help ! – xavier363 Feb 22 '18 at 08:12
0

Here's a vectorized approach - iris as a minimal reproducible example

keep_vec <- c("setosa", "versicolor")
split_df <- split(iris, iris$Species)
keep_dfs <- split_df[keep_vec]

# $setosa
   # Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1           5.1         3.5          1.4         0.2  setosa
# 2           4.9         3.0          1.4         0.2  setosa
# 3           4.7         3.2          1.3         0.2  setosa
# ...
# $versicolor
    # Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
# 51           7.0         3.2          4.7         1.4 versicolor
# 52           6.4         3.2          4.5         1.5 versicolor
# 53           6.9         3.1          4.9         1.5 versicolor

With your data, try

vecsandre
split_df <- split(TabR1, TabR1$CdStationMesureEauxSurface)
keep_dfs <- split_df[vecsandre]
CPak
  • 13,260
  • 3
  • 30
  • 48