2

I found this post - Shared array usage in Julia, which is clearly close but I still don't really understand what to do in my case.

I am trying to pass a shared array to a function I define, and call that function using @everywhere. The following, which has no shared array, works:

@everywhere mat = rand(3,3)
@everywhere foo1(x::Array) = det(x)

Then this

@everywhere println(foo1(mat))

properly produces different results from each worker. Now let me include a shared array:

test = SharedArray(Float64,10)
@everywhere foo2(x::Array,y::SharedArray) = det(x) + sum(y)

Then this

@everywhere println(foo2(mat,test))

fails on the workers.

ERROR: On worker 2:
UndefVarError: test not defined

etc. I can get what I want like this:

  for w in procs()
         @spawnat w println(foo2(eval(:mat),test))
   end

This works - but is it optimal? Is there a way to make it work with @everywhere?

Community
  • 1
  • 1
  • Did you try @everywhere test =...? – rcpinto Jun 07 '16 at 03:43
  • Can you post a reproducible example where you get performance problems from your use of `@spawn`? I don't see anything wrong with the example at the end that you post using `@spawn`, nor could I reproduce performance problems with it. – Michael Ohlrogge Jun 07 '16 at 12:17
  • 1
    In playing with the thing more, I've come to believe, @aireties, that the performance issues are something unrelated to the use or not of everywhere. I think I mixed some other code differences when I made my comparisons. So my apologies for the ill-posed post. Not sure what the proper action to take is now - remove the post? – Leon Balents Jun 08 '16 at 01:04
  • I don't think you need to remove the post, just edit it a bit. The initial error that you report getting is one that I think other users may encounter, and the answer from @tholy is useful for that. I would just remove the references in the post to the slowdowns and leave it about the error message. I actually think it's a great initial question you asked. – Michael Ohlrogge Jun 08 '16 at 01:07

1 Answers1

3

While it's tempting to use "named variables" on workers, it generally seems to work better if you access them via references. Schematically, you might do something like this:

mat = [@spawnat p rand(3,3) for p in workers()] # process 1 holds references to objects on workers
@sync for (i, p) in enumerate(workers())
    @spawnat p foo(mat[i], sharedarray)
end
tholy
  • 11,882
  • 1
  • 29
  • 42
  • In addition, using `@everywhere` to start processor intensive jobs is usually suboptimal. You don't want your main controller processor doing heavy work, which is what the `@everywhere` macro can lead to without a lot of awkward code to avoid it. – Michael Ohlrogge Jun 07 '16 at 17:18
  • This does seem like better form, @tholy. However it doesn't seem to quite work for me. If I do this, mat[i] is a remote reference, and remains one when passed via spawnat. So I will actually generate an error on the worker processes if the first argument of foo is supposed to be an Array (I tried it!). I can fix it by replacing mat[i] with fetch(mat[i]). But does this actually pull the value of mat[i] back to process 1 and then send it to process p? – Leon Balents Jun 08 '16 at 04:16
  • I wasn't able to test that due to being on my phone. In your function you should be able to add a `fetch` to the function you're running. – tholy Jun 08 '16 at 19:20
  • @LeonBalents I don't think that spawn combined with fetch as you describe should do that. Spawn means the command is run on the worker. I haven't tried to explicitly test it though – Michael Ohlrogge Jun 09 '16 at 00:13