10

I have an array x[] containing data. Also there is an array of "system states" c[]. The process:

for(i = 1; i < N; i++)
{   
  a = f1(x[i] + c[i-1]);
  b = f2(x[i] + c[i-1]);
  c[i] = a + b;
}

Is there any efficient way to find the values of f1 and f2 on 2-core system using 2 parallel threads? I mean the following (in pseudo-code):

thread_1
{
    for(i = 1; i < N; i++)
      a = f1(x[i] + c[i-1]);    
}
thread_2
{
    for(i = 1; i < N; i++)
    {
      b = f2(x[i] + c[i-1]);
      c[i] = a + b;  //here we somehow get a{i} from thread_1
    }
}

f1 and f2 are not time consumptive, but have to be calculated many times, so desired speedup is about x2. See the diagram for graphical representation:

desired parallel process

Looking for code examples for Windows.

carimus
  • 153
  • 7

3 Answers3

4

If I understand you right,

  • a[i] can only be calculated when c[i-1] is available
  • b[i] can only be calculated when c[i-1] is available
  • c[i] is only available when a[i] and b[i] are calculated

It means that the only process which you can do separately is calculating a[i] and b[i].

That's how I see it in C#:

for (int i = 1; i < N; i++)
{
    Task<double> calcA = Task.Factory.StartNew(() => { return f1(x[i] + c[i-1]); });
    Task<double> calcB = Task.Factory.StartNew(() => { return f2(x[i] + c[i-1]); });

    // .Result will block the execution and wait for both calculations to complete
    c[i] = calcA.Result + calcB.Result; 
}

This will run two separate threads, which will calculate f1 and f2 respectively. After both f1 and f2 are calculated, it will set c[i] value, and run the next iteration.

Note that:

  • I use double, assuming that your f1 and f2 return double
  • The loop starts from 1, assuming that you have some initial a[0] and b[0] values. Otherwise, c[i-1] would throw an exception
  • This will only bring improvement if calculation of f1 and f2 is really resource-consuming and long, compared to other calculations
  • Task.Factory.StartNew (unlike using Thread) uses ThreadPool which means that it doesn't create a new thread every time, but reuses the existing from the pool. It noticably reduces the overhead.
Yeldar Kurmangaliyev
  • 33,467
  • 12
  • 59
  • 101
  • This will work incorrect, as loop variable is used in closure. You need to create a local copy – VMAtm Dec 25 '15 at 08:32
  • @VMAtm Since the task is declared, run and finished within the same loop iteration, I see no possibility of `i` modification. I may be wrong, of course... – Yeldar Kurmangaliyev Dec 25 '15 at 08:33
  • 1
    It woul be efficien only if f1 and f2 are very havy and syncronization overhead will less than profit of parallel run – gabba Dec 25 '15 at 08:35
  • @gabba Of course! Thank you for clarification. I have updated the answer and hope that the OP will follow your advice – Yeldar Kurmangaliyev Dec 25 '15 at 08:36
  • 1
    I think `WaitAll` is not necessary since `Result` will wait for task to do it's job. – Hamid Pourjam Dec 25 '15 at 08:36
  • You open new threads at each iteration, this solution is straigthforward. But is there any way to leave threads for all the time of computing and synchronize them e.g. with memory? – carimus Dec 25 '15 at 08:52
  • @carimus I can imagine some, but I believe that it won't bring any improvements. `Task.Factory` uses `ThreadPool` which means that it is not creating a new thread every time - it reuses the existing from the pool. – Yeldar Kurmangaliyev Dec 25 '15 at 08:57
  • 1
    `Parallel.Invoke` would be more efficient – Lucas Trzesniewski Dec 25 '15 at 13:12
4

The only parallel part in this algorithm is calculation of f1 and f2, but you say that f1 and f2 are not time consumptive, so it might be much better to use SIMD vectorization (e.g. System.Numerics.Vectors in C#) and run it on one core (that also reduce cache misses). Or probably you could modify your algorithm to be parallelizeable (but it might require hard work).

Valery Petrov
  • 653
  • 7
  • 19
2

Without going into a code solution, you want to use some kind of barrier. This allows to check if all participans have declared they are finished with the task. Thread 2 will have to wait for thread one in this example

https://en.wikipedia.org/wiki/Barrier_(computer_science) Example of C++ "Memory barrier"

Community
  • 1
  • 1
Houbie
  • 1,406
  • 1
  • 13
  • 25