11

Consider the following code:

#include <string.h>

void bar(char c);

void foo(const char* restrict ss) {
    for (int i = 0; i < strlen(ss); ++i) {
        bar(*ss);
    }
}    

I would expect the strlen(ss) to be hoisted out of the loop in these essentially ideal conditions; and yet - it isn't, neither by clang 5.0 nor by gcc 7.3 with maximum optimization (-O3).

Why is this the case?

Note: Inspired by (my answer to) this question.

einpoklum
  • 118,144
  • 57
  • 340
  • 684
  • 7
    General advice: if you want a function hoisted out of a loop do it yourself. C is better at it than most languages in allowing this optimization to work. – Joshua Jan 28 '18 at 00:47
  • 2
    @Joshua: Well, yes, but you could say that about an endless number of optimizations which compiler do apply themselves. – einpoklum Jan 28 '18 at 01:08
  • 2
    Why would you expect it to? Functions can have side effects, and moving the call outside the loop can therefore change the semantics of the program. – Andrew Henle Jan 28 '18 at 01:10
  • @AndrewHenle: Are you talking about the potential side-effects of `bar()`? – einpoklum Jan 28 '18 at 01:12
  • 1
    The compiler is definitely allowed to hoist it. Going by [C11 6.7.3.1](https://port70.net/~nsz/c/c11/n1570.html#6.7.3.1), "During each execution of [`foo`], let L be any lvalue that has &L based on [`ss`]. If L is used to access the value of the object X that it designates, and X is also modified (by any means), then the following requirements apply: [the type `ss` points to] shall not be const-qualified... If these requirements are not met, then the behavior is undefined." – user2357112 Jan 28 '18 at 01:19
  • 4
    You might also want to consider the possibility that no-one's actually ever coded up this optimisation, even if it *is* valid. There's no *requirement* that programs be optimised maximally, I suspect an implementation would still be valid if all `i` variables were stored on a slow hard disk on a server farm in Botswana :-) – paxdiablo Jan 28 '18 at 01:36
  • Feel free to propose such an optimization to GCC and/or to Clang. – Basile Starynkevitch Jan 28 '18 at 01:44
  • I consider that question to be opinion based, so I voted to close it. Any "why GCC don't optimize in such a way" question can be answered with "because nobody contributed to GCC to propose that optimization" and why it is so is really a matter of opinion. – Basile Starynkevitch Jan 28 '18 at 01:58
  • In contrast, a question such as "why does GCC optimize in such way" can be objectively answered with references to standard, and/or by pointing some internal GCC passes doing that optimization. – Basile Starynkevitch Jan 28 '18 at 02:00
  • @einpoklum: I remember when the C compiler wasn't allowed to assume that standard library functions do what they say they do. – Joshua Jan 28 '18 at 03:49
  • 1
    @BasileStarynkevitch It is not *necessarily* opinion-based, namely if there is a factual reason in the language or the code forbidding the optimization. One such reason could be the semantics that you understand `restrict` has, which apparently differ from the OP's. I think the OP is asking in order to see whether they didn't know, overlooked or misunderstood something. An answer of "there is nothing in the language or the code forbidding it" is therefore valuable. – Peter - Reinstate Monica Jun 17 '20 at 08:43

3 Answers3

6

The other answers claim that the strlen call cannot be hoisted because the string's contents might change between calls. Those answers do not properly account for the semantics of restrict; even if bar had access to the string through a global variable or some other mechanism, the semantics of restrict pointers to const types should(see caveat) prohibit bar from modifying the string.

From C11, N1570 draft, 6.7.3.1:

1 Let D be a declaration of an ordinary identifier that provides a means of designating an object P as a restrict-qualified pointer to type T.

2 If D appears inside a block and does not have storage class extern, let B denote the block. If D appears in the list of parameter declarations of a function definition, let B denote the associated block. Otherwise, let B denote the block of main (or the block of whatever function is called at program startup in a freestanding environment).

3 In what follows, a pointer expression E is said to be based on object P if (at some sequence point in the execution of B prior to the evaluation of E) modifying P to point to a copy of the array object into which it formerly pointed would change the value of E.137) Note that ''based'' is defined only for expressions with pointer types.

4 During each execution of B, let L be any lvalue that has &L based on P. If L is used to access the value of the object X that it designates, and X is also modified (by any means), then the following requirements apply: T shall not be const-qualified. Every other lvalue used to access the value of X shall also have its address based on P. Every access that modifies X shall be considered also to modify P, for the purposes of this subclause. If P is assigned the value of a pointer expression E that is based on another restricted pointer object P2, associated with block B2, then either the execution of B2 shall begin before the execution of B, or the execution of B2 shall end prior to the assignment. If these requirements are not met, then the behavior is undefined.

5 Here an execution of B means that portion of the execution of the program that would correspond to the lifetime of an object with scalar type and automatic storage duration associated with B.

Here, the declaration D is const char* __restrict__ ss, and the associated block B is the body of foo. All lvalues through which strlen accesses the string have &L based on ss(see caveat), and those accesses occur during the execution of B (since, by the definition in section 5, the execution of strlen is part of the execution of B). ss points to a const-qualified type, so by section 4, the compiler is allowed to assume string elements accessed by strlen are not modified during the execution of foo; modifying them would be undefined behavior.

(caveat)The above analysis assumes that strlen accesses the string through "ordinary" pointer dereferencing or indexing. If strlen uses techniques like SSE intrinsics or inline assembly, it is not clear to me whether such accesses technically count as using an lvalue to access the value of the object that it designates. If they do not count as such, the protections of restrict may not apply, and the compiler may not be able to perform the hoisting.

Maybe the above caveat voids the protections of restrict. Maybe the compiler doesn't know enough about the definition of strlen to analyze its interaction with restrict (I'm surprised it wasn't inlined). Maybe the compiler is free to perform the hoisting and just didn't realize it; perhaps some relevant optimization is not implemented, or it failed to propagate the necessary information between the right compiler components. Determining the exact cause would require much more familiarity with the GCC and Clang internals than I possess.

Further-simplified tests that eliminate strlen and the loop show that Clang definitely has some support for restrict-pointer-to-const optimizations, but I wasn't able to observe any such support from GCC.

user2357112
  • 260,549
  • 28
  • 431
  • 505
  • This would mean that `foo` is providing a guarantee, or at least allowing the compiler to assert, that `bar` - even when implemented in another compilation unit - doesn't modify the memory pointed to by `ss`, or at least those elements accessed by `strlen`? Even if the `bar` doesn't honour the guarantee, that's pretty much the same situation as C99's `restrict` for non-overlapping regions, or C++11's `noexcept`. But that quoted section is harder to read than an EULA:) – Brett Hale Jan 28 '18 at 12:52
  • I have tried with different implementations of strlen: int __attribute__ ((noinline)) __attribute__ ((const)) strlen_(char const * restrict a) { int i = 0; while ('\0' != *a) { ++i; ++a; } return i; } When one forbid inline, the loop is optimized out. So, inline optimization can affect negatively other optimizations. – Michal Butterweck Jan 28 '18 at 15:20
  • None of the precludes a function that can not modify a `const`-qualified argument from having side effects. If a function can have side effects, it can not be moved from the loop. – Andrew Henle Jan 28 '18 at 18:22
  • @AndrewHenle: The function the questioner wants hoisted is `strlen`, which doesn't have side effects. (Of course, it's possible that the parts of the compiler that know that and the parts of the compiler that know about `restrict` couldn't combine their information to perform the hoisting; that would fall under the "failed to propagate the necessary information" hypothesis.) – user2357112 Jan 28 '18 at 18:26
2

Since strlen is being passed a pointer and it is possible that the contents of the memory it is pointing to will change between invocations to strlen so optimizing the call away could introduce bugs. If you can guarantee to the gcc that the function will always return the same value it will optimize it away. From the function attribute documentation:

const

Many functions do not examine any values except their arguments, and have no effects except to return a value. Calls to such functions lend themselves to optimization such as common subexpression elimination. The const attribute imposes greater restrictions on a function’s definition than the similar pure attribute below because it prohibits the function from reading global variables. Consequently, the presence of the attribute on a function declarations allows GCC to emit more efficient code for some calls to the function. Decorating the same function with both the const and the pure attribute is diagnosed.

So taking out the external dependency on strlen, look at the difference in the following two compilations:

int baz (const char* s) __attribute__ ((pure));

void foo(const char* __restrict__ ss)
{
    for (int i = 0; i < baz(ss); ++i)
        bar(*ss);
}

Yields:

foo:
        push    rbp
        push    rbx
        mov     rbp, rdi
        xor     ebx, ebx
        sub     rsp, 8
        jmp     .L2
.L3:
        movsx   edi, BYTE PTR [rbp+0]
        add     ebx, 1
        call    bar
.L2:
        mov     rdi, rbp
        call    baz
        cmp     eax, ebx
        jg      .L3
        add     rsp, 8
        pop     rbx
        pop     rbp
        ret

But if we change the pure attribute on baz to const you can see the call hoisted out of the loop:

foo:
        push    r12
        push    rbp
        mov     r12, rdi
        push    rbx
        xor     ebx, ebx
        call    baz
        mov     ebp, eax
        jmp     .L2
.L3:
        movsx   edi, BYTE PTR [r12]
        add     ebx, 1
        call    bar
.L2:
        cmp     ebp, ebx
        jg      .L3
        pop     rbx
        pop     rbp
        pop     r12
        ret

So maybe hunt through your header files and see how strlen is declared.

Community
  • 1
  • 1
Turn
  • 6,656
  • 32
  • 41
  • BTW, I suggest to use `gcc -O2 -fverbose-asm -S` and to show the emitted `.s` assembler file – Basile Starynkevitch Jan 28 '18 at 02:43
  • @BasileStarynkevitch Do you think that would demonstrate the principle more clearly? – Turn Jan 28 '18 at 02:48
  • "Since strlen is being passed a pointer and it is possible that the contents of the memory it is pointing to will change between invocations to strlen" <- No, it is not possible, due to the `restrict` semantics and the fact that `foo()` does not change it. – einpoklum Jun 17 '20 at 08:29
1

ss might be some global variable because you could call foo with some global array like char str[100]; as its argument (e.g. by having foo(str); in your main)...

and bar could modify that global variable (then strlen(ss) could change at every loop).

BTW restrict has perhaps not the meaning you believe. Read carefully section §6.7.3 of the C11 standard and §6.7.3.1. IMHO restrict is in practice mostly useful on two formal arguments of the same function to express the fact that they are not "aliases" or "overlapping" pointers (if you guess what I really mean), and perhaps optimization efforts on restrict have probably been focused on such cases.

Perhaps (but unlikely), on your particular program, a compiler might optimize as you want if you invoke it as gcc -flto -fwhole-program -O3 (on every translation unit, and at program link time). I won't bet on that (but I leave you to check).

Why is this case?

As to why current GCC (or Clang) does not optimize like you want it to, it is because nobody has written such an optimization pass and have it enabled at -O3.

Compilers are not required to do optimizations, just allowed to do some of them (at the choice of their implementors).

Since it is free software, feel free to propose a patch by contributing to GCC (or to Clang). You might need a whole year of work, and you are not sure that your optimization would be accepted (because in practice nobody codes like you show, or because your optimization would be too specific, so unlikely to be triggered, but still slowing down the compiler). But you are welcome to try.

Even if §6.7.3.1 allows your optimization (as the answer by user2357112 demonstrates), it might practically not worth the effort to implement it.

(my intuition is that implementing such an optimization is hard, and won't profit much to existing programs in practice)

BTW, you can definitely experiment such an optimization by coding some GCC plugin doing it (since the plugin framework was designed for such experimentations). You might discover that coding such an optimization is a lot of work and that practically speaking it does not improve the performance of most existing programs (e.g. in a Linux distribution), because few people code that way.

Both GCC and Clang are free software projects, and contributors to them are (from the point of view of e.g. the FSF) volunteers. So feel free to improve GCC (or Clang) like you want it to optimize. By past experience, contributing even a small piece of code to GCC takes a lot of time. And GCC is a huge program (about ten millions lines of code), so understanding its internals is not easy.

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • It's `__restrict__`'ed, so regardless of whether it's a global or what-not, we're guaranteed nothing will change it while it's being used in `foo()`. – einpoklum Jan 28 '18 at 00:39
  • 3
    Unfortunately, `restrict` has not the (powerful) meaning you want it to have. – Basile Starynkevitch Jan 28 '18 at 00:42
  • 1
    The standard section you quote is not a limitation on what compilers are allowed to assume from the presence of a `restrict` qualifier. It doesn't support the argument you're trying to make. – user2357112 Jan 28 '18 at 01:05
  • Removed it, and I won't copy all of §6.7.3 and §6.7.3.1 into my answer. – Basile Starynkevitch Jan 28 '18 at 01:06
  • Even after your deletion I think you're misinterpreting what compilers are allowed to do with `__restrict__`. Of course I might be wrong. The question is, really, what the compiler authors think they're allowed to do. – einpoklum Jan 28 '18 at 01:07
  • BTW, you could propose a patch doing your optimization, since [GCC](http://gcc.gnu.org/) is [free software](https://en.wikipedia.org/wiki/Free_software). I don't think that coding such a patch is easy (it might take you a year of work), and I am not sure it would be accepted easily by the GCC community. But you can try... Good luck on that! – Basile Starynkevitch Jan 28 '18 at 01:10
  • @BasileStarynkevitch: I doubt this is a bug, it's probably "intentional", for some definition of the word. – einpoklum Jan 28 '18 at 01:13
  • Yes, the intention is related to my explanation – Basile Starynkevitch Jan 28 '18 at 01:14
  • Even if my understanding of `restrict` is not fully correct (and I do admit I never understood all of it; I rarely use it, and when I do it is on two formals of same function), the GCC community is not required to optimize like OP wishes it to. – Basile Starynkevitch Jan 28 '18 at 02:11
  • `__restrict__` is a gcc extension, it doesn't have to correspond to C11 `restrict` – M.M Jan 29 '18 at 09:23