0

Can you please give a simple example of Parallel Programming?

I have a big for loop and I want to process it faster and use all the CPU cores. What can I do? Is this related to Parallel?

for example: (Calculating Pi digits)

var
  A: array of LongInt;
  I, J, K, P, Q, X, Nines, Predigit: Integer;
  NumDigits,PiLength: Integer;
  answer : string;

begin
  NumDigits := 5000;
  SetLength(A, 10*NumDigits div 3);
  SetLength(answer, NumDigits+1);
  PiLength := 1;
  for I := Low(A) to High(A) do
    A[I] := 2;
  Nines := 0;
  Predigit := 0;
  for J := 0 to NumDigits-1 do //This loop
  begin
    Q := 0;
    P := 2 * High(A) + 1;
    for I := High(A) downto Low(A) do
    begin
      X := 10*A[I] + Q*(I+1);
      A[I] := X mod P;
      Q := X div P;
      P := P - 2;
    end;
    A[Low(A)] := Q mod 10;
    Q := Q div 10;
    if Q = 9 then
      Inc(Nines)
    else if Q = 10 then
    begin
      answer[PiLength] := Chr(Predigit + 1 + Ord('0'));
      for K := 1 to Nines do
        answer[PiLength+K] := '0';
      PiLength := PiLength + Nines + 1;
      Predigit := 0;
      Nines := 0;
    end
    else
    begin
      answer[PiLength] := Chr(Predigit + Ord('0'));
      Predigit := Q;
      for K := 1 to Nines do
        answer[PiLength+K] := '9';
      PiLength := PiLength + Nines + 1;
      Nines := 0;
    end;
  end;

  answer[PiLength] := Chr(Predigit + Ord('0'));
end;

This code only uses one core of CPU. Of course because it's one thread. How can I use all the CPU cores while I have only one main thread?

Update: If you believe this code can't be run in parallel. Then see this example:

var
  i , a : integer;
Begin
  a := 0;
  for i := 0 to 100000 do 
   begin
     a := a + 1;
   end;
end;
Sky
  • 4,244
  • 7
  • 54
  • 83
  • You need to give us more information. A code example showing the loop and some idea of what your application is trying to do would be a start. – Andy_D Dec 04 '13 at 10:50
  • https://www.google.com/search?q=parallel+programming+delphi – Free Consulting Dec 04 '13 at 10:52
  • http://docwiki.embarcadero.com/CodeExamples/XE5/en/RTL.Threads_Sample – David Heffernan Dec 04 '13 at 10:53
  • 1
    What makes you think that algorithm can be run in parallel? Do you understand it yet? Where is your dependency analysis that says that the iterations are not dependent on each other? – David Heffernan Dec 04 '13 at 11:17
  • @DavidHeffernan Your link is about threads. I asked about Parallel. If you think this algorithm can't be run in parallel, how can I use all CPU cores for this algorithm? – Sky Dec 04 '13 at 12:21
  • 1
    @Sky How can you achieve parallel processing without either threads, or multiple processes? In other words, what do you mean by *Parallel*? Why do you believe that threads cannot be used to implement parallel processing? If your algo has strong dependencies then it's quite possible that you cannot use all CPUs. That's the dirty secret of the multi-core revolution. Most algos are not readily split into multiple parallel threads of execution. – David Heffernan Dec 04 '13 at 12:24
  • 2
    *How can I use all the CPU cores while I have only one main thread?* You cannot. – David Heffernan Dec 04 '13 at 12:25
  • Anyway, you didn't really answer my questions. I'll repeat them. What makes you think that algorithm can be run in parallel? Do you understand it yet? Where is your dependency analysis that says that the iterations are not dependent on each other? You really do need to think about all of those issues. Do be aware that you cannot wave a magic wand and make code run correctly on multiple processors. Perhaps this might be relevant also: http://stackoverflow.com/questions/10890613/fast-algorithm-to-calculate-pi-in-parallel – David Heffernan Dec 04 '13 at 12:28
  • @DavidHeffernan well I don't have many informations about parallels and that's why I'm asking. I tried to run that in parallel and I couldn't. And of course I understand the algorithm otherwise I wouldn't use that as an example. You are saying that all 'for loops' can't be run in parallel right? – Sky Dec 04 '13 at 12:36
  • You talk about *parallels* as if it has a special meaning. What is that special meaning? If iteration `i` depends on the result of iteration `i-1`, then you have to wait until `i-1` is done. – David Heffernan Dec 04 '13 at 12:41
  • @Sky I still don't know what *parallels* means. I do suggest that you study this algorithm in some detail. One thing you might try would be to move the code that initialise `A[]` inside the for loop on J. That breaks the dependency that comes through `A[]`. But it also makes the output wrong. – David Heffernan Dec 04 '13 at 12:48
  • Regarding your edit, you make a good point. Now you just need to do the same for your algo. Good luck! – David Heffernan Dec 04 '13 at 12:52
  • 1
    I don't understand the significance of the last snippet, if you break apart that loop you'll end up with `a:=100000+1` – Sertac Akyuz Dec 04 '13 at 13:02
  • @SertacAkyuz What Sky means is that the for loop in the final snippet has dependencies from one iteration to the next. And yet the loop can be written in a parallel form. Obviously it can be evaluated statically. A better example would be a loop to calculate the sum `x[i]`. Made parallel with classic fork/join. But the key point is that loop can be performed in any order. Anyway, I think my answer's para 2 covers it all. It is certainly plausible that the algo in the Q could be made parallel. But as it stands, you cannot use parallel for on it. – David Heffernan Dec 04 '13 at 13:06
  • `a = 1 * 100001` and so what? Eliminate recurrence first. – Free Consulting Dec 04 '13 at 13:12
  • http://www.experimentalmath.info/bbp-codes/bbp-alg.pdf – Free Consulting Dec 04 '13 at 13:17
  • 2
    this can be done: http://www.xtremesystems.org/forums/showthread.php?221773-New-Multi-Threaded-Pi-Program-Faster-than-SuperPi-and-PiFast – whosrdaddy Dec 04 '13 at 13:19
  • @whosrdaddy Which algo does that link use? The same as in the Q? I think that's crucial here. – David Heffernan Dec 04 '13 at 13:22
  • I get a "0" as the first digit. Is that by design? – JensG Dec 04 '13 at 18:43
  • @JensG Yes, Because of `Predigit := 0;` – Sky Dec 04 '13 at 19:07

1 Answers1

5

Your algorithm has strong dependencies between consecutive iterations. This dependency can most readily be seen in the array A[]. This means that you cannot execute the for loop in the question in parallel. You need to know the outcome of iteration I-1 before you can start iteration I. And as well as the obvious dependency that flows through, A[], there may well be others.

It is conceivable, that you could re-work the algorithm to break apart these dependencies. However, it is also quite conceivable that this algorithm is simply not amenable to parallel execution. It is perfectly normal for it to be impossible to express an algorithm in a parallel way. You would need to study this algorithm in some detail to work out whether or not it can be re-cast in a parallel form.

There are many different algorithms for calculating π. Because of the obvious potential of parallel processing, all modern research into π digit extraction will surely be with parallel algorithms. I'm sure that a web-search will reveal plenty of algorithms that are suitable for parallel execution.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • The algorithm can be split into two parts quite easily: The first part produces the Q values, and the second part processes them. The problem is, the first part of the algorithm is slower than the second part of the calculation. So instead of working in parallel, the second thread is most of the time just waiting for more Qs handed over from the first thread, effectively reducing any multithread gain to zero (or negative). And that solution is still far from "using all cores" (except for the case when there are only two cores at all :-) – JensG Dec 04 '13 at 20:46
  • @JensG This is the type of analysis that needs to be performed. It's exactly what I mean by my second paragraph. I would not be surprised if the algo could be decomposed further. I also would not be surprised if this algo had limited opportunities for parallelisation. – David Heffernan Dec 04 '13 at 20:48