2

I'm trying to find the most efficient method for multiplying a Matrix by its transpose. Any ideas on the most efficient mathematical formula?

  • With modern compilers, your versions are equivalent. But you should use the ikj ordering to improve cache use. It may make a *big* difference. See for instance this question https://stackoverflow.com/questions/9936132/why-does-the-order-of-the-loops-affect-performance-when-iterating-over-a-2d-arra – Alain Merigot May 30 '19 at 07:48
  • 1
    If you multiply a matrix by its inverse you get the identity matrix, so there shouldn't be any reason to multiply at all. – Brendan May 30 '19 at 07:52
  • This question is not for this forum, please post on : https://codereview.stackexchange.com/ – Vagish May 30 '19 at 09:20
  • 1
    @Brendan The OP may have written "inverse", but I guess they meant *transposed* – Bob__ May 30 '19 at 09:32
  • Honestly, if this is production quality at all, use a library that will do this better. – Michael Dorgan May 31 '19 at 00:02
  • I would write n * n only once: unsigned int nn = n * n; ..., And I would definitely use const keyword if n is const. – alirakiyan May 02 '21 at 10:43

2 Answers2

0

You can improve your first solution, by initializing A and B matrix in the same for loop.

n * k is also calculated two times, you can store it into a variable to save some time.

It is better to use B[i + n * j] += ... instead of B[i + n * j] = B[i + n * j] + ..., because in the first one, B[i + n * j] is read once, when at the second on it is read twice.

void print_unmodified()
{
    unsigned int n = 64u;
    unsigned int A[N_MAX];
    unsigned int B[N_MAX];

    /* Initializing the A and B matrix with values 1->64^2 */
    for (int i = 0; i < (n * n); i++)
    {
        A[i] = i + 1;
        B[i] = 0u;
    }

    /* Matrix Multiplication B = A*A' */
    for (int i = 0; i < n; i++)
    {
        for (int j = 0; j < n; j++)
        {
            int index = i + n*j;
            for (int k = 0; k < n; k++)
            {
                int p = n * k;
                B[index] += A[i + p] * A[j + p];
            }
        }
    }
}
Norbert Incze
  • 351
  • 3
  • 11
0

The posted code gives the correct answer, a matrix B which is the product of another matrix A by its transposed, but both matrices are stored in a couple of arrays in column-major order (or at least, that's what what the OP states in the comments).

Considering this snippet:

for (int i = 0; i < n; i++) {
    for (int j = 0; j < n; j++) {
        for (int k = 0; k < n; k++) {
            B[i + n * j] = B[i + n * j] + A[i + n * k] * A[j + n * k];
        //                                  ^^^^^^^^^      ^^^^^^^^^
        }
    }
}

We can notice that the inner loop traverses those arrays skipping n elements every time and that's not cache-friendly.

If we arrange the order of the loops so that the inner one iterates over contiguous elements, we could have a performance gain:

// Calculate the matrix B = A times A transposed. A is an n x n matrix.
// A and B are stored in plain arrays ('src' and 'dest') in column-major order 
void mult_AAT(size_t n, value_t const *src, value_t *dest)
{
    for (size_t i = 0, end = n * n; i < end; i++)
    {
        dest[i] = 0;
    }
    for (size_t k = 0; k < n; k++)         // <--
    {
        for (size_t j = 0; j < n; j++)     // <-
        {
            for (size_t i = 0; i < n; i++)
            {
                dest[j * n + i] += src[k * n + i] * src[k * n + j];
                //   ^^^^^^^^^         ^^^^^^^^^        ^^^^^^^^^     
            }
        }
    }
}

Live HERE.

Bob__
  • 12,361
  • 3
  • 28
  • 42
  • Thanks for much for your response - It was very helpful! –  May 31 '19 at 00:02
  • @SamTalbot Thanks for having edited your question, but there are some other info that I think are missing to obtain a proper answer. You may have noted, in the link I posted, that I stored the matrices *row-wise* in the arrays (e.g. look at the output of the first 2x2 matrix). Is that what you need or do you have to store them *column-wise*? In the same first example, my resulting B matrix is different from your algorithm's, which one is the "correct" one? The logic behind that function is that the j column of the transposed of A is just the j row of A and iterating that way is cache-friendly. – Bob__ May 31 '19 at 07:54
  • @SamTalbot Well, that wasn't a minor detail ;). Answer changed accordingly. – Bob__ Jun 01 '19 at 12:52