Speed up recursive determinant algorithm

Question

How do I speed up this recursive function? When it reaches a 10x10 matrix, it takes up a minute or so just to solve a problem. I included the event function as well so you can see when the calculation would take place.

void determinantsFrame::OnCalculateClick(wxCommandEvent &event)
{
    double elem[MAX][MAX]; double det; string test; bool doIt = true;
    for (int i = 0; i < n; i++)
    {
        for (int j = 0; j < n; j++)
        {
            test = (numbers[i][j]->GetValue()).mb_str();
            if (test == "")
            {
                doIt = false;
                break;
            }

            for (int k = 0; k < test.length(); k++)
                if (isalpha(test[k]) || test[k] == ' ')
                {
                    doIt = false;
                    break;
                }
                else if (ispunct(test[k]))
                {
                    if (test[k] == '.' && test.length() == 1)
                        doIt = false;
                    else if (test[k] == '.' && test.length() != 1)
                        doIt = true;
                    else if (test[k] != '.')
                        doIt = false;
                }

            if (doIt == false)
                break;
        }
        if (doIt == false)
            break;
    }

    if (doIt)
    {
        for (int i = 0; i < n; i++)
            for (int j = 0; j < n; j++)
                elem[i][j] = static_cast<double>(wxAtof(numbers[i][j]->GetValue()));

        det = determinant(elem, n);
        wxMessageBox(wxString::Format(wxT("The determinant is: %.4lf"),det));
    }
    else
        wxMessageBox(wxT("You may have entered an invalid character. Please try again"));
}

double determinantsFrame::determinant(double matrix[MAX][MAX], int order) // Here's the recursive algorithm
{
    double det = 0; double temp[MAX][MAX]; int row, col;

    if (order == 1)
        return matrix[0][0];
    else if (order == 2)
        return ((matrix[0][0] * matrix[1][1]) - (matrix[0][1] * matrix[1][0]));
    else
    {
        for (int r = 0; r < order; r++)
        {
            col = 0; row = 0;
            for (int i = 1; i < order; i++)
            {
                for (int j = 0; j < order; j++)
                {
                    if (j == r)
                        continue;

                    temp[row][col] = matrix[i][j];
                    col++;

                    if (col == order - 1)
                        col = 0;
                }
                row++;
            }
            det = det + (matrix[0][r] * pow(-1, r) * determinant(temp, order - 1));
        }
        return det;
    }
}

Consider replacing the recursive call with `stack<>`-based data recursion. It'll eliminate some `JMP` and stack frame overhead. Also, have you run this through a profiler? Where is the performance bottleneck? You're also allocating a new `temp` for every call. Is that really necessary? — 3Dave, Nov 06 '14 at 14:57
If you don't get any useful answers here, you might try at http://codereview.stackexchange.com/. However, I consider this question relevant here, so I wouldn't close it. — 3Dave, Nov 06 '14 at 14:59
It would help greatly if this was a [MCVE](http://stackoverflow.com/help/mcve). What is `MAX`, `numbers[][]`, `n` etc.... The size of the matrix will probably determine what type of optimization is needed (3x3 matrix is going to be optimized much differently than a 30000x30000). — uesp, Nov 06 '14 at 15:03
This question appears to be off-topic because it belongs on http://codereview.stackexchange.com/ since it is about improving working code. — Baum mit Augen, Nov 06 '14 at 15:18
I believe the determinant of a matrix is usually calculated using the [LU-decomposition](https://en.wikipedia.org/wiki/LU_decomposition). This is however still pretty costly, you should not compute the determinant of a matrix for any reason but wanting to know the determinant. For all of its uses (I can think of), there are better solutions. — Baum mit Augen, Nov 06 '14 at 15:23
This is for our project in school. Our task is to make a program that can compute for the determinant using expansion by minors. — Jessie, Nov 06 '14 at 15:28
I can't replicate your performance. For `order == 10` I get run times just over 100 ms, though performance will depend a little on the contents of the matrix. It blows up quickly past that since this algorithm is `O(n!)` (factorial). — uesp, Nov 06 '14 at 15:32
I think it might be `O(n!+n^3)` (n! from the recursion and n^3 from the three nested loops) which would decay into `O(n!)`. The run times of `determinant()` are definitely close to `order!` anyways: 10=60ms, 11=600ms, 12=7700ms, 13=101000ms — uesp, Nov 06 '14 at 15:43
The timings certainly seems like N! but the loops says (N*N*N)*((N-1)*(N-1)*(N-1))*((N-2) ... which can be written as (N*(N-1)*(N-2)...)*(N*(N-1)*...) so N!*N!*N! or N!^3 so I wonder where my reasoning fails here. — Surt, Nov 06 '14 at 15:58

score 0 · Answer 1 · edited May 23 '17 at 12:20

It could be a branch mispredict problem (see also). The test

if (col == order - 1)
    col = 0;

Is not needed as far as I can see.

The test fails 1/order times per loop and dominates for small order, which is why larger N aren't so affected. The timing is still large O(N!^3) (afaik) so don't expect miracles.

        col = 0; row = 0;
        for (int i = 1; i < order; i++) {
            for (int j = 0; j < order; j++) {
                if (j == r)
                    continue;

                temp[row][col] = matrix[i][j];
                col++;

                //if (col == order - 1)
                //    col = 0;
            }
            col = 0; // no need to test
            row++;
        }

The algorithm will get a further slowdown when it hit L2 cache, at latest at N=64.

Also the matrix copy might be ineffective, this could be far more effective for large order at the cost of low effectiveness at low order.

    for (int r = 0; r < order; r++) {
        row = 0;
        for (int i = 1; i < order; i++) {
            memcpy(temp[row], matrix[i], r*sizeof(double)); // if r==0 will this work?
            memcpy(&temp[row][r], &matrix[i][r+1], (order-r-1)*sizeof(double));
            // amount of copied elements r+(order-r-1)=order-1.

            row++;
        }

Make a test with the original code to get the determinant that I got the indexes right!

score 0 · Accepted Answer · answered Nov 06 '14 at 20:30

You can do a bit better with keeping the same algorithm but it is at least O(n!) (probably worse) so higher order matrices will be slow no matter how much you optimize it. Note I did the benchmark times in MSVC 2010 and are there only for rough comparison purposes. Each change is cumulative as you go down the list and is compared to the original algorithm.

Skip Col Check -- As Surt suggested, removing this gets us a speed increase of 1%.
Add 3x3 Case -- Adding another explicit check for a 3x3 matrix gets us the most, 55%
Change pow() -- Changing the pow() call to (r % 2 ? -1.0 : 1.0) gets us a little bit more, 57%
Change to switch -- Changing the order check to a switch gets us a little bit more, 58%
Add 4x4 Case -- Adding another explicit check for a 4x4 matrix gets more, 85%

Things that don't work include:

memcpy -- As Surt suggested this actually looses a good deal of speed, -100%
Threads -- Creating order threads doesn't work well at all, -160%

I was hoping that using threads could get us a significant performance increase but even with all the optimization it is slower than the original. I think the copying of all the memory is making it not very parallel.

Added the 3x3 and 4x4 cases has the most effect and are the primary reason for the over x6 increase in speed. In theory you could add more explicit cases (probably by creating a program to output the required code) to reduce the speed even further. Of course, at some point this kind of defeats the purpose of using a recursive algorithm to begin with.

To get more performance you would probably have to consider a different algorithm. In theory you can change the recursive function into an iterative one by managing your own stack but it is considerable work and you aren't guaranteed a performance increase anyways.

Speed up recursive determinant algorithm

2 Answers2