0

During my internship, I have to use nested for loops, and I wrote something like this (in C) :

        for (int N_dt0_s_dt = 0; N_dt0_s_dt < N_dt; N_dt0_s_dt++)
        {
            for (int j = 0; j < N; j++)
            {
                for (int i = 0; i < N0; i++)
                {
                    Matrix_t0[i + N0 * j] += M_t_t0_loc[i + N0 *(N_dt0_s_dt + j)];
                }
            }
        }

Here, N and N0 are integers representing the matrix dimensions. N and N0 are between 200 and 500, depending on the input matrix. During the review of the code, some coworker suggested the following changes :

        int dec,dec1;//to compute index
        for (int N_dt0_s_dt = 0; N_dt0_s_dt < N_dt; N_dt0_s_dt++)
        {
            for (int j = 0; j < N; j++)
            {
                dec = N0 * j;
                dec1 = N0 *(N_dt0_s_dt + j);
                for (int i = 0; i < N0; i++)
                {
                    Matrix_t0[i + dec] += M_t_t0_loc[i + dec1];
                }
            }
        }
    }

The coworker told me the latter should run faster, because we are not computing the operations needed to come up with the N0 *(N_dt0_s_dt + j) at each iteration.

I think it is pretty legitimate, but I was asking myself if the compiler (Visual Studio) was already doing optimizations which would make those changes useless.

Some thoughts on this would be appreciated.

Ethan B.
  • 21
  • 4
  • It would be easier to assess if you showed how N and N0 are defined. There's a good chance an optimizing compiler will do what you're doing by hand. The only reliable ways to determine benefit are to look at the generated assembler or measure the running code. The definitions of `dec` and `dec1` could perfectly well be inside the `for` loop — with no performance penalty. – Jonathan Leffler Jul 18 '18 at 23:35
  • Ok, I just defined N and N0. I don't know much of assembler though, but you seem confident about the fact that there is no performance penalty. Why is that ? In the original code, am I not computing N0*j N0*N time, when I only need to compute it j time ? It seems to me there are more opertations to do, isn't there ? – Ethan B. Jul 18 '18 at 23:45
  • As noted in my answer, Common Subexpression Elimination is the process of detecting that the terms `j * N0` and `N0 * (N_dt0_s_dt + j)` are invariant during the innermost loop, and therefore can be evaluated once. i.e. doing exactly what your second example does explicitly. In practice, I wouldn't be at all surprised if the inner loop is actually optimized down considerably more aggressively. Something along the lines of not actually iterating over ints at all, but instead creating a couple of anonymous pointers that address the two matrices, and iterating them directly. – dgnuff Jul 19 '18 at 00:02
  • 2
    This is clearly [Premature Optimization](https://en.wikipedia.org/wiki/Program_optimization#When_to_optimize). Write code for readability and maintainability first, and optimize only AFTER you have profiled the code and identified a bottleneck. – Jim Garrison Jul 19 '18 at 00:59

2 Answers2

4

You'd want to look at the assembly produced to be sure. That said, Common Subexpression Elimination is a very well understood optimization technique, and as far as I know, most of the commonly used compilers nowadays will do it with suitable optimization options.

Anecdotally, optimizers are getting quite good at their jobs nowadays. It used to be that you'd get a slap on the wrist if you wrote:

for (auto it = container.begin(); it != container.end(); it++)

However I've tested MSVC by writing two equivalent functions, one of which used the above, while the other used the "approved":

for (auto it = container.begin(); it != container.end(); ++it)

with no other differences.

Not only did MSVC correctly realize that it could replace it++ with ++it since I didn't actually use the value, it then proceeded to notice during linkage that the two routines were now identical, so it completely elided one of them, and used the object code of the other for both invocations.

dgnuff
  • 3,195
  • 2
  • 18
  • 32
  • 2
    C compilers still give you a slap on the wrist if you write C++ code and present it to a C compiler. – Jonathan Leffler Jul 18 '18 at 23:47
  • @JonathanLeffler touche. :) Although the case can be made that with both MSVC ands GCC, it's the same executable doing the compilation, so optimization techniques of this sort to a large degree become language agnostic between C and C++. – dgnuff Jul 18 '18 at 23:51
  • Posting invalid C code to a C question is less valuable than using compilable C code. Drop `auto`. – chux - Reinstate Monica Jul 19 '18 at 03:41
3

It is not as much for an optimization, but rather for the readability. I would go one step further, and realize that in dec1 = N0 *(N_dt0_s_dt + j) is in fact dec1 = N0 *N_dt0_s_dt + N0*j, and that the first term does not depend on j, while N0*j is dec, leading to

    for (int N_dt0_s_dt = 0; N_dt0_s_dt < N_dt; N_dt0_s_dt++)
    {
        int something = N0 * N_dt0_s_dt;
        for (int j = 0; j < N; j++)
        {
            dec = N0 * j;
            dec1 = something + dec;
            for (int i = 0; i < N0; i++)
            {
                Matrix_t0[i + dec] += M_t_t0_loc[i + dec1];
            }
        }
    }

or even

    int something = 0;
    for (int N_dt0_s_dt = 0; N_dt0_s_dt < N_dt; N_dt0_s_dt++)
    {
        int something += N0;;
        for (int j = 0; j < N; j++)
        {
            dec = N0 * j;
            dec1 = something + dec;
            for (int i = 0; i < N0; i++)
            {
                Matrix_t0[i + dec] += M_t_t0_loc[i + dec1];
            }
        }
    }

Of course something is a bad name, but there is a strong hint that it represents an important concept in the problem domain. I have a suspicion of what it actually is, but without seeing more of the context I cannot be sure. You can. Try to come up with a good name, and observe the improvement of readability.

user58697
  • 7,808
  • 1
  • 14
  • 28