Elixir: start processes at very same time

Question

Let's say I have this module

defmodule Loader do

  def spawn_pools(0, host, iteations, pids) do
    launch!(pids) #something I want to achieve
  end

  def spawn_pools(pools, host, iterations, pids) do
    pid = spawn_link(__MODULE__, :siege, [host, iterations])
    spawn_pools(pools-1, host, iterations, [pid|pids])
  end

end

So if other module will execute Loader.spawn_pools(10, host, iterations, []), it will spawn 10 processes of executing method siege.

The problem is that I want it to be as parallel as it can be -- to start execution of all processes at very same moment of time.

So I thought of this

def siege do
  receive do
   {:launch} -> #...
  end
end

But it kind of brings me to the same problem - so then I need to send :launch to all this processes at same time. Which brings me to recursion, another layer of same problem.

P.S. I'm new to Erlang/Elixir paradigm, so may be I'm missing something?

Computers very rarely run a bunch of things at the same time. They do little bits of lots of tasks so quickly that you think they're at the same time. Elixir/Erlang do a great job of distributing work over all the available CPU cores, but each core is only going to be doing one thing at the "very same time." Even if this was possible, your actual parallelism would be limited to available CPU cores. — CoderDennis, Mar 15 '16 at 19:59
You're really asking about running several processes concurrently. Whether or not they all start in the same microsecond is sort of immaterial because even if they appear to do so, they don't really under the hood. I would just spin up your processes and let the VM worry about scheduling them. — Onorio Catenacci, Mar 15 '16 at 21:53

score 3 · Answer 1 · answered Mar 15 '16 at 19:40

Erlang and Elixir execute code sequentially in each process; since processes are spawned from other processes, it's in the nature of the language that the act of spawning is sequential. There's no way to synchronize the spawning of ≥ 1 processes. Sending a message to each process to "synchronize" the starting of the processes' jobs has the same problem: sending a message is sequential, so the main process will still be sending messages one at a time. Even if you distribute the spawning/message-sending over multiple processes, guaranteeing they all start at the exact same time is basically impossible.

However, both message sending as well as process spawning are very fast actions, so the problem is usually small.

A solution could be to get the current timestamp before spawning any process, and passing it to every new process: that process will then get its current timestamp, subtract the initial timestamp, and thus get how "later" it has been spawned. You can use this information to take advantage of things like :timer.sleep/1 to try and emulate a synchronized start, but it's still subject to varying degrees of precision in clocks and whatnot :).

well, 10 processes was for example, I was thinking of thousands of them, caring not for microseconds, but for case when half of processes are executed already while other half is not yet spawned (I want relative concurrency). I mean, there should be some standard solution pattern, which erlang/elixir programmers use? — Joe Half Face, Mar 15 '16 at 19:49

score 0 · Accepted Answer · edited May 23 '17 at 12:14

The closest you can get is using a list comprehension. It's a language construct and therefore theoretically could be compiled to be executed in parallel (however, it's not due to other issues described later). See how the parallel_eval function is written in an official Erlang library. This is essentially doing something like this:

[spawn(fun() -> ReplyTo ! {self(), promise_reply, M:F(A)} end) || A <- ArgL]

of which example you can see in my Erlang code.

If you think about it it's impossible to start executing some processes exactly in parallel because at the lowest level the physical CPU has to start executing each process sequentially. The Erlang VM needs to allocate a stack for the new process, which, according to the documentation takes 309 words of memory. Then it needs to pass the initial parameters, add it to the scheduler, etc. See also this thread which contains more technical references explaining Erlang processes.

EDIT:

You can benchmark how long it takes to create one process, and this simple code is a quick stab at two aproaches:

-module(spawner).

-export([start1/1, start2/1]).

start1(N) ->
    start_new1(erlang:monotonic_time(), self(), 4),
    loop(round(math:pow(4, N)), 0, []).

start_new1(Start, Pid, N) ->
    Fun = fun() -> child(Start, Pid, N-1) end,
    [spawn(Fun) || _ <- lists:seq(1, 4)].

child(Start, Pid, 0) -> send_diff(Start, Pid);
child(Start, Pid, N) -> start_new1(Start, Pid, N).

loop(All, All, Acc) ->
    {All, lists:sum(Acc)/All, lists:min(Acc), lists:max(Acc)};
loop(All, Y, Acc) ->
    receive Time -> loop(All, Y+1, [Time|Acc]) end.

send_diff(Start, Pid) ->
    Diff = erlang:monotonic_time() - Start,
    Pid ! erlang:convert_time_unit(Diff, native, micro_seconds).


start2(N) ->
    All = round(math:pow(4, N)),
    Pid = self(),
    Seq = lists:seq(1, All),
    Start = erlang:monotonic_time(),

    Fun = fun() -> send_diff(Start, Pid) end,
    [spawn(Fun) || _ <- Seq],
    loop(All, 0, []).

start1/1 spawns a tree of processes - each process spawns 4 children processes. The argument is the amount of generations, e.g. there will be 4^N leaf processes (256 for N=4). start2/1 spawns the same effective amount of processes but sequentially, one by one. In both cases the output is the average, minimum, and maximum amount of time to spawn one process (the leaf in case of the tree) in microseconds.

1> c(spawner).
{ok,spawner}
2> spawner:start1(4).
{256,868.8671875,379,1182}
3> spawner:start2(4).
{256,3649.55859375,706,4829}
4> spawner:start2(5).
{1024,2260.6494140625,881,4529}

Note that in start1 apart from the leaf processes there will be many more supporting processes which only live to generate children. It seems that the time from the start to generating each leaf child is shorter in the first case, but in my environment it didn't want to finish in a reasonable time for N=5. But you could take this idea or something similar and tune the N and amount of children processes spawned by each process according to your needs.

what do you think of each next one process sending previous one something like `sleep_until_ready` message, when my `siege` method starts actual execution if it haven't got this message for 100 ms, or loops if got one. Looks like a possible solution to me — Joe Half Face, Mar 15 '16 at 20:06
Have you checked the last link in my answer? It says: We observe that the time taken to create an Erlang process is constant 1µs up to 2,500 processes; thereafter it increases to about 3µs for up to 30,000 processes. So spawning one thousand of processes may take around 1ms. I would recommend that you do some benchmarking before attempting to optimize your code. Another trick you could use is to start a tree of processes instead of starting them sequentially, e.g. let each process start 4 child processes. So the first process starts 4 children, each of which starts their 4 own children, ... — Greg, Mar 15 '16 at 22:53
.. and so on. Five generations of processes would start 1024 processes. So depending on how many processes you need you could allow the first five or six generations of processes to die after they created their children and only start calculation in the, say, seventh generation. Some of these processes may be moved to other CPUs by the VM scheduler. Using a message to synchronize started processes, as you proposed, is another option, but it's likely that the scheduler will start executing some of the processes before the other have had their chance to process the message. — Greg, Mar 15 '16 at 22:59

Elixir: start processes at very same time

2 Answers2