1

Imagine calculating a three dimensional array like this:

for (int i = 0; i < I; i++)
{
    for (int j = 0; j < J; j++)
    {
        for (int k = 0; k < K; k++)
        {
            array[k + j * K + i * K * J] = someValue(i, j, k);
        }
    }
}

But the k + j * K + i * K * J part is kinda expensive. Is it possible to tell the compiler to convert the loops into something like this?

array[0] = someValue(0, 0, 0);
array[1] = someValue(0, 0, 1);
array[2] = someValue(0, 0, 2);
array[3] = someValue(0, 1, 0);
...

This would ofcourse make the binaries larger, but would also speed up performance, if this code is executed a lot. Is it possible to do this? Or would I have to generate the code myself and paste it into the source file?

Jannis
  • 233
  • 1
  • 7
  • You ask about loop unrolling, an optimization, but doesn't specify a compiler. There is no answer that fits *all compilers*. It will be compiler specific. Anyway, here's a GCC specific Q&A https://stackoverflow.com/questions/4071690/tell-gcc-to-specifically-unroll-a-loop – StoryTeller - Unslander Monica Jun 01 '19 at 08:07
  • Have you measured, benchmarked and profiled that this is a major bottleneck in your program? Have you done it with compiler optimizations enabled? – Some programmer dude Jun 01 '19 at 08:18
  • 1
    I haven't written the program yet, currently I'm planning. I'd like to have a solution for atleast GCC and MSVC. – Jannis Jun 01 '19 at 08:28
  • 1
    no need to worry about that. compilers are smart enough to optimize for loops and often replace multiplication with pointer increment. In your case [only a single multiplication is used](https://godbolt.org/z/f5sI90) – phuclv Jun 01 '19 at 09:26

1 Answers1

2

I believe in your particular case, we can re-write the loop as:

auto* scan = array;
for (int i = 0; i < I; i++)
{
    for (int j = 0; j < J; j++)
    {
        for (int k = 0; k < K; k++)
        {
            *scan++ = someValue(i, j, k);
        }
    }
}

This is a micro-optimization, and not something you usually have to worry about. Here's why.

Reason 1: integer multiplication is incredibly cheap. Calculating k + j * K + i * K * J is cheaper than retrieving a value from the computer's RAM, and it'll be about as cheap as (if not cheaper than) retrieving it from the CPU's fastest cache.

Reason 2: Compilers are incredibly smart. They can recognize which values change and which values stay the same, and optimize common sub-expressions out of loops (so that they're not performing the same computation multiple times).

Reason 3: Compilers are capable of taking advantage of vectorization instructions. Depending on what someValue does, it may be able to compute multiple values in parallel on the same core by taking advantage of this. This is true for either method of indexing into array.

C++ code isn't strictly imperative. Compilers can and do make major and complex optimizations to make code more efficient, and code like the one in your example are easy for them to optimize.

Alecto Irene Perez
  • 10,321
  • 23
  • 46