2

I am trying to speed up my code with a 'foreach' loop using the doSMP package.

Here is a simplified version of my issue: I am running a file called main.R

file: main.R:

require(doSMP)
dropbox_path = "/home/ruser/Dropbox"
workers <- startWorkers(4)
registerDoSMP(workers)
foreach(jj=1:4 ) %dopar% source("test.R")
stopWorkers(workers)

file: test.R:

message(dropbox_path)

This returns the following error: "Error in source("test.R") : task 1 failed - "object 'dropbox_path' not found"

If I modify main.R to be :

require(doSMP)
dropbox_path = "/home/ruser/Dropbox"
workers <- startWorkers(4)
registerDoSMP(workers)
foreach(jj=1:4 ) %dopar% message(dropbox_path)
stopWorkers(workers)

It then works very well. It also used to work well with sequencial code ('for' instead of 'foreach').

So R child instances can access the dropbox_path variable, but not when it is parsed through the source function. I tried to play around with the source() function arguments 'local' and 'chdir' with no sucess.

Would you know a way for the code to work? I would like to keep using the source() function.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
Sam
  • 565
  • 6
  • 23

1 Answers1

1

I apologize in advance that my little workaround doesn't use your tools, but here's how I would do it. I use package snowfall because I can easily extend my apply functions to work on multiple cores. The code is not tested because all of my cores are currently occupied. Should work, though.

tiny_script.R contents:

date()

R code to fire up multiple cores:

library(snowfall)
sfInit(parallel = TRUE, cpus = 4, type = "SOCK") #power up
my.list <- vector("list", 10)

sfLapply(x = my.list, fun = function(x) source("./Odpad/tiny_script.R")) 

Running on a single core using lapply only:

> lapply(X = my.list, FUN = function(x) source("./Odpad/tiny_script.R")) #notice the difference in argument names between `lapply` and `sfLapply`.
[[1]]
[[1]]$value
[1] "Wed Jun 22 13:02:11 2011"

[[1]]$visible
[1] TRUE


[[2]]
[[2]]$value
[1] "Wed Jun 22 13:02:11 2011"
Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
  • I Roman, I tested your code and get the same issue: your tiny_script.R does not use any variable. If instead of date() you have a function requiring a variable such as mine: message(dropbox_path) where dropbox_path is in the global environment, you get the same error. My question is how can the script in source use variables of the parent environment (that the child environment can access as I demonstrated in my second example) ? – Sam Jun 22 '11 at 11:52
  • @Sam You will have to export all your variables to cores. `sfExport(list = c("var1", "var2", "function1"))`. – Roman Luštrik Jun 22 '11 at 11:55
  • ok the message() function does not work, however the source(script.R) has access to variable. I will see if I can use the snowfall package instead of doSMP – Sam Jun 22 '11 at 12:13
  • @Sam So far, I was unsuccessful at writing to the console when in parallel session (I think of each core process as an individual R console and from what I can tell, you can't message from one console to another). Appending to a file should work, though. – Roman Luštrik Jun 22 '11 at 12:15
  • That's ok, I can append a file. What I do not know using `snowfall` instead of `doSMP` is how can each core access the iterator: `sfLapply(x = my.list,f(x))` is replacing a loop `for(x in my.list){f(x)}`. How can each core access their own value of `x`? – Sam Jun 22 '11 at 21:05
  • 1
    @Sam That's a tricky one. `snowfall` works best with `apply` family of functions and was, AFAICT, not designed to work with iterators as such. There are possible work-arounds, see for instance here: http://stackoverflow.com/questions/4164960/which-list-element-is-being-processed-when-using-snowfallsflapply – Roman Luštrik Jun 23 '11 at 05:34
  • Thanks Roman, I managed to get the trick from your link working. However, back to my initial issue, the code enclosed in the script called by `source(test.R)` can access variables exported with `sfExport()`, it cannot access variables created within the function `f` (the function in `sfLapply`). Would you have a solution for that as well ? – Sam Jun 23 '11 at 16:39
  • @Sam You can not move data from one core to the other that easily just yet. See the discussion and some possible solutions in one of my recent questions: http://stackoverflow.com/questions/6251662/writing-to-global-environment-when-running-in-parallel – Roman Luštrik Jun 27 '11 at 11:11