Optimising Matrix Calculation in C

Question

I'm trying to find the most efficient method for multiplying a Matrix by its transpose. Any ideas on the most efficient mathematical formula?

With modern compilers, your versions are equivalent. But you should use the ikj ordering to improve cache use. It may make a *big* difference. See for instance this question https://stackoverflow.com/questions/9936132/why-does-the-order-of-the-loops-affect-performance-when-iterating-over-a-2d-arra — Alain Merigot, May 30 '19 at 07:48
If you multiply a matrix by its inverse you get the identity matrix, so there shouldn't be any reason to multiply at all. — Brendan, May 30 '19 at 07:52
This question is not for this forum, please post on : https://codereview.stackexchange.com/ — Vagish, May 30 '19 at 09:20
@Brendan The OP may have written "inverse", but I guess they meant *transposed* — Bob__, May 30 '19 at 09:32
Honestly, if this is production quality at all, use a library that will do this better. — Michael Dorgan, May 31 '19 at 00:02
I would write n * n only once: unsigned int nn = n * n; ..., And I would definitely use const keyword if n is const. — alirakiyan, May 02 '21 at 10:43

Norbert Incze · Answer 1 · 2019-05-30T18:23:28.320

0

You can improve your first solution, by initializing A and B matrix in the same for loop.

n * k is also calculated two times, you can store it into a variable to save some time.

It is better to use B[i + n * j] += ... instead of B[i + n * j] = B[i + n * j] + ..., because in the first one, B[i + n * j] is read once, when at the second on it is read twice.

void print_unmodified()
{
    unsigned int n = 64u;
    unsigned int A[N_MAX];
    unsigned int B[N_MAX];

    /* Initializing the A and B matrix with values 1->64^2 */
    for (int i = 0; i < (n * n); i++)
    {
        A[i] = i + 1;
        B[i] = 0u;
    }

    /* Matrix Multiplication B = A*A' */
    for (int i = 0; i < n; i++)
    {
        for (int j = 0; j < n; j++)
        {
            int index = i + n*j;
            for (int k = 0; k < n; k++)
            {
                int p = n * k;
                B[index] += A[i + p] * A[j + p];
            }
        }
    }
}

edited May 30 '19 at 18:23

answered May 30 '19 at 08:01

Norbert Incze

351
3
11

Both versions are n^3. Why do you think second version is n^4? – Alain Merigot May 30 '19 at 08:09
You are correct, both are O(n^3). I misread the incremental part. – Norbert Incze May 30 '19 at 09:20
Since B is full of 0's he could initialise it with calloc or memset if he's under linux – Gerardo Zinno May 30 '19 at 17:06

Bob__ · Answer 2 · 2019-06-01T12:51:18.007

The posted code gives the correct answer, a matrix B which is the product of another matrix A by its transposed, but both matrices are stored in a couple of arrays in column-major order (or at least, that's what what the OP states in the comments).

Considering this snippet:

for (int i = 0; i < n; i++) {
    for (int j = 0; j < n; j++) {
        for (int k = 0; k < n; k++) {
            B[i + n * j] = B[i + n * j] + A[i + n * k] * A[j + n * k];
        //                                  ^^^^^^^^^      ^^^^^^^^^
        }
    }
}

We can notice that the inner loop traverses those arrays skipping n elements every time and that's not cache-friendly.

If we arrange the order of the loops so that the inner one iterates over contiguous elements, we could have a performance gain:

// Calculate the matrix B = A times A transposed. A is an n x n matrix.
// A and B are stored in plain arrays ('src' and 'dest') in column-major order 
void mult_AAT(size_t n, value_t const *src, value_t *dest)
{
    for (size_t i = 0, end = n * n; i < end; i++)
    {
        dest[i] = 0;
    }
    for (size_t k = 0; k < n; k++)         // <--
    {
        for (size_t j = 0; j < n; j++)     // <-
        {
            for (size_t i = 0; i < n; i++)
            {
                dest[j * n + i] += src[k * n + i] * src[k * n + j];
                //   ^^^^^^^^^         ^^^^^^^^^        ^^^^^^^^^     
            }
        }
    }
}

Live HERE.

@SamTalbot Thanks for having edited your question, but there are some other info that I think are missing to obtain a proper answer. You may have noted, in the link I posted, that I stored the matrices *row-wise* in the arrays (e.g. look at the output of the first 2x2 matrix). Is that what you need or do you have to store them *column-wise*? In the same first example, my resulting B matrix is different from your algorithm's, which one is the "correct" one? The logic behind that function is that the j column of the transposed of A is just the j row of A and iterating that way is cache-friendly. — Bob__, May 31 '19 at 07:54
@SamTalbot Well, that wasn't a minor detail ;). Answer changed accordingly. — Bob__, Jun 01 '19 at 12:52

Optimising Matrix Calculation in C

2 Answers2