4

I am trying to solve LU decomposition with the Doolittle Algorithm – according to this document. Without parallelization, code works fine. However, I would like to make this code run in parallel - trying to make a parallel outer and two inner loops. Unfortunately I am still missing something. If I tried to first make the outer loop run in parallel, I received a bad result.

I tried to detect where there might be a collision. I locked those places afterwards, but I still did not receive the right result. I added them to the copied code as comments. What am I doing wrong, do I need lock out other places?

What´s the right implementation of outer loop?

How would inner loops look like?

Thank you in advance.



Implementation of algorithm (sequential)

        //upper triangle
        var upper = new double[arr.GetLength(0), arr.GetLength(0)];
        //lower triangle
        var lower = new double[arr.GetLength(0), arr.GetLength(0)];

        //field initialization
        for (int i = 0; i < n; i++)
        {
            for (int j = i; j < n; j++)
                upper[i, j] = arr[i, j];
            for (int j = i + 1; j < n; j++)
                lower[j, i] = arr[j, i];
            lower[i, i] = 1;
        }

        for(int i=0; i<n; i++)
        {
            for (int j = i; j < n; j++)
            {
                for (int k = 0; k < i; k++)
                { 
                    upper[i, j] = upper[i, j] - (lower[i, k] * upper[k, j]);
                }
            }

            for (int j = i + 1; j < n; j++)
            {
                for (int k = 0; k < i; k++)
                {
                    lower[j, i] = lower[j, i] - (lower[j, k] * upper[k, i]);
                }
                    lower[j, i] = lower[j, i] / upper[i, i];
            }
        }

Implementation of algorithm (parallel)

        //upper triangle
        var upper = new double[arr.GetLength(0), arr.GetLength(0)];
        //lower triangle
        var lower = new double[arr.GetLength(0), arr.GetLength(0)];

        //field initialization
        for (int i = 0; i < n; i++)
        {
            for (int j = i; j < n; j++)
                upper[i, j] = arr[i, j];
            for (int j = i + 1; j < n; j++)
                lower[j, i] = arr[j, i];
            lower[i, i] = 1;
        }

        //making outer loop parallel
        Parallel.For(0, n, (i, state) =>
        {
            //possibly make this loop parallel also
            for (int j = i; j < n; j++)
            {
                for (int k = 0; k < i; k++)
                {
                    //lower[i,k] is it potential problem?
                    /*
                     * I tried this solution
                     * double a;
                     * lock(lowerLock){
                     *   a = lower[i,k];
                     * }
                     * upper[i, j] = upper[i, j] - (a * upper[k, j])
                     */
                    upper[i, j] = upper[i, j] - (lower[i, k] * upper[k, j]);
                }
            }

            //possibly make this loop parallel also
            for (int j = i + 1; j < n; j++)
            {
                for (int k = 0; k < i; k++)
                {
                    //upper [k,i] is it potential problem?
                    /*
                     * I tried this solution
                     * double b;
                     * lock(upperLock){
                     *   b = upper[k, i];
                     * }
                     * lower[j, i] = lower[j, i] - (lower[j, k] * b);
                     */
                    lower[j, i] = lower[j, i] - (lower[j, k] * upper[k, i]);
                }
                    lower[j, i] = lower[j, i] / upper[i, i];
            }
        });

sequential right result

Concatenation  Upper triangle  Lower triangle
 2 -1 -2       2 -1 -2          1  0  0
-2  4 -1       0  4 -1         -2  1  0
-2 -1  3       0  0  3         -2 -1  1

parallel bad result

Concatenation  Upper triangle    Lower triangle
 2 -1 -2       2 -1 -2             1  0  0
-2  4 -1       0  4 -1            -2  1  0
-2 -1  3       0  0 10 -->BAD     -2 -1  1

EDIT I tried to lock all the approaches to the fields with one lock. I am aware of loosing all parallelization in this way. However, I wanted to achieve at least the right result, unfortunately without success.

 static object mylock = new object();
            //making outer loop parallel
            Parallel.For(0, n, (i, state) =>
            {
                for (int j = i; j < n; j++)
                {
                    for (int k = 0; k < i; k++)
                    {
                        lock (mylock)
                        {
                            upper[i, j] = upper[i, j] - (lower[i, k] * upper[k, j]);
                        }
                    }
                }

                for (int j = i + 1; j < n; j++)
                {
                    for (int k = 0; k < i; k++)
                    {
                        lock (mylock)
                        {
                            lower[j, i] = lower[j, i] - (lower[j, k] * upper[k, i]);
                        }
                    }
                    lock (mylock)
                    {
                        lower[j, i] = lower[j, i] / upper[i, i];
                    }
                }
            });
Arsiwaldi
  • 513
  • 1
  • 7
  • 23
  • 2
    One thing that catches eye is you modify your `upper` and `lower` arrays, and also read from them, from multiple threads without any synchronization (like `lock`). I'd start with surrounding all access to upper and lower in `lock` and see if that fixes bad result. Of course it's not a solution, because it will destroy parallelization effect, but just to confirm the issue. – Evk Dec 01 '17 at 21:40
  • 1
    It would probably be better to include the results as text rather than images. – David Conrad Dec 01 '17 at 21:41
  • `Without parallelization, code works fine.` Please show us your 'non parallel' code that works fine. – mjwills Dec 01 '17 at 21:47
  • Correct me if I'm wrong, but `upper[i, j] = upper[i, j] - (lower[i, k] * upper[k, j]);` looks like each loop is dependent of previous steps. I.e. when `i = 2`, to calculate `upper[2, 2]`, `upper[1, 2]` is being used, which is calculated in the loop of `i=1`. – Mike Mat Dec 01 '17 at 22:00
  • @Evk Even though I have locked all the accesses with one lock, I get a bad result. – Arsiwaldi Dec 02 '17 at 11:18
  • @MikeMat Yes, you are right. Loop is dependent on previous, as you said. – Arsiwaldi Dec 02 '17 at 11:19

1 Answers1

2

The parallel loops write to the same array, right?

upper[i, j] = upper[i, j] - (lower[i, k] * upper[k, j]);

But it is not defined, when which loop will write to a position in your array. So two loops will NOT write to the same index but read from an index, where the other loop might have already written to. You can't parallize your algorithm this way.

Olaf
  • 101
  • 1
  • 6
  • Ok, I get your point. Is it even possible to make this algorithm run in parallel in certain way? I'll try to look at some other possible solution – Arsiwaldi Dec 02 '17 at 11:41
  • What is the usual maximum value of of your n? – Olaf Dec 03 '17 at 13:44
  • And btw. are you shure, your loop for (int k = 0; k < i; k++) { upper[i, j] = upper[i, j] - (lower[i, k] * upper[k, j]); } is necessary/correct? I didn't see that yesterday. – Olaf Dec 03 '17 at 14:06
  • Excuse me for late answer, actually I am working on new solution. Well, maximum value of n is not defined. Matrix could have random n. And yes, I am sure ( I took this algorithm from given document. Link at the top). – Arsiwaldi Dec 05 '17 at 20:36
  • Ok, parallelization only makes sense, if n is large enough. Otherwise you pay more than you gain. That said you can do the following: use a second temporary matrix (upper2) in addition to matrix upper. Whenever you write to upper[i,j], write to upper2[i,j]. When ALL your loops are finished, replace upper by upper2. Do the same with lower. – Olaf Dec 05 '17 at 21:39