Loop parallelization

Question

I have been vainly trying to solve the following problem for hours: "Make the neccessary loop transformations so that parallel executions are possible"

do i=1,n
   do j=1,n
       A[i,j]=A[i-3,j-3]+A[i-4,j-4]
   end do
end do

Unfortunately,my loop transformation knowledge is poor.Therefore I will probably need a detailed explanation,if possible.Thank you in advance for your help!

Stackoverflow is not a "do my homework" site. Have you tried anything so far? Are you using a specific framework for parallel execution? — VHarisop, Aug 31 '15 at 22:15
Well,I have tried these:http://www.cs.colostate.edu/~cs560/Spring2012/ClassNotes/lecture06-parallelization.ppt.pdf , http://stackoverflow.com/questions/23096653/how-is-loop-skewing-making-loop-parallelizable , http://cs.mwsu.edu/~passos/caesar/tutorial1/sld041.htm. I must say that I learned a lot of things,but still,I am not able to find a solution.No specific framework is given. — user3697730, Aug 31 '15 at 22:18

score 0 · Answer 1 · answered Jul 10 '17 at 05:54

Your code perfectly fits into a so called Polyhedral framework for automatic loop transformations and locality optimizations. Refer PLUTO compiler which can automatically parallelize your code. PLUTO transforms your code to:

for (t1=0;t1<=floord(N-1,16);t1++) {
    lbp=max(0,ceild(32*t1-N+1,32));
    ubp=min(floord(N-1,32),t1);
    #pragma omp parallel for private(lbv,ubv,t3,t4)
    for (t2=lbp;t2<=ubp;t2++) {
        for (t3=32*t1-32*t2;t3<=min(N-1,32*t1-32*t2+31);t3++) {
        lbv=32*t2;
        ubv=min(N-1,32*t2+31);
        #pragma ivdep
        #pragma vector always
        for (t4=lbv;t4<=ubv;t4++) {
           A[t3][t4]=A[t3-3][t4-3]+A[t3-4][t4-4];;
        }
    }
}

You can see that tiling followed skewing helps to extract two levels of parallelism. More details can be found at: http://pluto-compiler.sourceforge.net/

Loop parallelization

1 Answers1