1

I have the following two functions which do basically the same:

enum Direction{
    N = 0,
    NW,
    W,
    SW,
    S,
    SE,
    E,
    NE,
    TOTAL_DIRS
};

char const * const strings[] = {"N", "NW", "W", "SW", "S", "SE", "E", "NE"};

char const *
getDirString2(unsigned dir) {
    if (TOTAL_DIRS > dir)
        return strings[dir];
    return nullptr;
}

char const *
getDirString3(unsigned dir) {
    char const * const strings[] = {"N", "NW", "W", "SW", "S", "SE", "E", "NE"};
    if (TOTAL_DIRS > dir)
        return strings[dir];
    return nullptr;
}

But while g++ optimizes the function which uses the global array like I would expect. It creates much more, convoluted code for the alternative. Clang creates the same code for both and if I use a switch-statement instead, both clang and c++ also create the same code as for getDirString2.

Here is a link to compiler explorer https://godbolt.org/z/GxvrTv

Is this something for which I should file a bug report for g++ or is there a good reason for that?

Michael Mahn
  • 737
  • 4
  • 11

2 Answers2

4

I guess you could call this a missed optimisation, although that's a bit harsh on the gcc guys.

gcc, when compiling getDirString3, is doing exactly what you asked it to do - construct an array of strings on the stack and then return just one element of it.

clang, on the other hand, sees that this array never changes and constructs it in static storage instead, see: https://godbolt.org/z/24n-N7

To make gcc generate code like clang does, declare the array inside getDirString3 as static (which would have been a good idea in the first place), see: https://godbolt.org/z/henD2Z

Paul Sanders
  • 24,133
  • 4
  • 26
  • 48
  • Interesting, I would never have guessed that adding static creates better optimization here. I've also never heard that I should declare constants as static. Should I always declare them as static const if they're compile time constants? Is this a general rule? – Michael Mahn Jul 19 '19 at 18:47
  • 2
    @MichaelMahn For something like simple like `const int x = 4`, declaring it static makes no difference since the compiler will optimise the variable away anyway, but **within a function**, declaring something like `const int x [] = { 1, 2, 3, 4, 5 }` static may do since, without the static,you are declaring an array that has to be built on the stack each time the function is called. – Paul Sanders Jul 19 '19 at 19:02
0

Note: At first, I said that this is a non-conforming optimization, but after my discussion with Chris Dodd in the comments, I realized that this is partially incorrect.

The Updated Answer:

Is this something for which I should file a bug report for g++ or is there a good reason for that?

Both clang and gcc can do the same optimization, but it's disabled in gcc by default. So it's not a bug, and there's a reason for it.

What did clang do?

In your example, the compiler saw that the local array is constant and as a result there's no need to construct it on the stack each time the function is invoked.

How to enable the same optimization in gcc?

To enable the same optimization in gcc, use the the flag -fmerge-all-constants. The resultant code will be almost the same size as clang's code (see: https://godbolt.org/z/Ldx_qe):

getDirString3(unsigned int):
        xor     eax, eax
        cmp     edi, 7
        ja      .L1
        mov     edi, edi
        mov     rax, QWORD PTR strings.2080[0+rdi*8]

Why it's disabled by default in gcc?

From gcc documentation website:

-fmerge-all-constants ... Languages like C or C++ require each variable, including multiple instances of the same variable in recursive calls, to have distinct locations, so using this option results in non-conforming behavior.

In the above example the optimization was safe and followed the as-if rule, because the address of this local array wasn't used or compared with an address from different invocation. However, it seems that this optimization can lead in some cases to non-confroming behavior. You can check these cases: here and here.

So, should you use the -fnmerge-all-constants flag?

Both compilers don't check weather the address of the constants is used or not, therefore I don't recommend it. It's obvious in your example that the optimization is safe, but you can't know every time where youre compiler will optimize (or else you would've done it yourself) and you can't be sure that it won't lead sometimes to non-confroming behavior.

mt3d
  • 315
  • 2
  • 11
  • Note that the optimization is only non-conforming IF the address of the local constant is passed out of the function or is used in a way that might compare addresses from different invocations. Neither of those is true of `getDirString3` so it should be safe and legal. The problem being if the correct potential alias checking is not done, leading to applying the optimization in an unsafe fashion. – Chris Dodd Jul 19 '19 at 18:28
  • 1
    @ChrisDodd can you quote the standard? It'll help to improve the answer. And this optimization was applied in an unsafe fashion, see: https://lkml.org/lkml/2018/3/20/872 . – mt3d Jul 19 '19 at 18:46
  • This follows from the as-if rule -- if there's no way to observe the difference between the two, then the optimization is safe. The only way to observe this difference is if some code is dependent on the addresses of the constants in different frames. If nothing ever depends on those addresses, then it cannot be observed. – Chris Dodd Jul 19 '19 at 19:19
  • @mt3d Here it's different. In the bug report the address of the variable is used. Here it is not. – Rakete1111 Jul 19 '19 at 19:32
  • @ChrisDodd Let's try to find common ground between me and you. We both agree that the optimization in `getDirString3` case is safe, and that it follows the as-if rule, but my question is: what enabled this optimization? It's the `-fmerge-all-constants` flag, which exists in both clang and gcc (so it's not missed by the gcc guys) and enabled by default in clang. Now, do I think this flag is safe? I don't think so. I don't think clang checks whether the address is used or not, if so, it wouldn't have led to non-conforming behavior like the two cases above. It's just a matter of luck. – mt3d Jul 20 '19 at 12:00
  • @ChrisDodd So now what? Should we praise clang for this optimization and knock gcc for disabling it by default? Do we have to advise against using it? Should the programmer enable it and pray that nothing go wrong? If the programmer knew where the compiler will optimize and knew that it's safe, why wouldn't they make the optimization by themself? IMO, the flag should be disabled by default, gcc is doing the right thing, and maybe in the future, both compiler should provide a safer version of this flag by checking whether the address of the variable is used or not. What do you think? – mt3d Jul 20 '19 at 12:04
  • @mt3d: you should knock both of them for not implementing it correctly :-) You probably shouldn't use it as implemented because it was implemented unsafely, but if either compiler fixes it, then you should use it. – Chris Dodd Jul 20 '19 at 16:48
  • @ChrisDodd Totally agree. I re-wrote my answer. – mt3d Jul 21 '19 at 14:07
  • @mt3d Thank you! I've declared the array as "static" in my code now and noticed significant performance improvements. I've also tried fmerge-all-constants but didn't notice any further improvements, so I think I'm good now. And I agree that I should not file a bug with gcc. I wasn't aware that the standard requires constants to have distinct locations and although this optimitzation might be legal here, there is a good reason to omit by default. Learnt something new. – Michael Mahn Jul 22 '19 at 10:15