Intuition for understanding gcc -Og and -Os loop output

Question

Background

I've assumed for a while that gcc will convert while-loops into do-while form. (See Why are loops always compiled into "do...while" style (tail jump)?)

And that -O0 for while-loops...

while (test-expr)
    body-statement

..Will generate code on the form of jump-to-middle do-while

    goto test;
loop:
    body-statement
test:
    if (test-expr) goto loop;

And gcc -O2 will generate guarded do while

if (test-expr)
    goto done;
loop:
    body-statement
    if (test-expr) goto loop;
done:

Concrete examples

Here are godbolt examples of functions for which gcc generates the kind of control flow I'm describing above (I use for-loops but a while loop will give the same code).

This simple function...

int sum1(int a[], size_t N) {
    int s = 0;
    for (size_t i = 0; i < N; i++) {
        s += a[i];
    }
    return s;
}

Will for -O0 generate this jump to middle code

```sum1:
        push    rbp
        mov     rbp, rsp
        mov     QWORD PTR [rbp-24], rdi
        mov     QWORD PTR [rbp-32], rsi
        mov     DWORD PTR [rbp-4], 0
        mov     QWORD PTR [rbp-16], 0
        jmp     .L2
.L3:
        mov     rax, QWORD PTR [rbp-16]
        lea     rdx, [0+rax*4]
        mov     rax, QWORD PTR [rbp-24]
        add     rax, rdx
        mov     eax, DWORD PTR [rax]
        add     DWORD PTR [rbp-4], eax
        add     QWORD PTR [rbp-16], 1
.L2:
        mov     rax, QWORD PTR [rbp-16]
        cmp     rax, QWORD PTR [rbp-32]
        jb      .L3
        mov     eax, DWORD PTR [rbp-4]
        pop     rbp
        ret

Will for -O2 generate this guarded-do code.

sum1:
        test    rsi, rsi
        je      .L4
        lea     rdx, [rdi+rsi*4]
        xor     eax, eax
.L3:
        add     eax, DWORD PTR [rdi]
        add     rdi, 4
        cmp     rdi, rdx
        jne     .L3
        ret
.L4:
        xor     eax, eax
        ret

My question

What I'm after is hand-wavy rule to apply when looking at -Os loops. I'm more used to looking at -O2 code and now that I'm working in the embedded field where -Os is more prevalent, I'm surprised by the form of loops I see.

It seems that gcc -Og and -Os both generate code as a jmpat a bottom and if() break at the top. Clang on the other hand generated guarded-do-while A godbolt link to gcc and clang output

Here is an example of gcc -Os output for the above function:

sum1:
        xor     eax, eax
        xor     r8d, r8d
.L2:
        cmp     rax, rsi
        je      .L5
        add     r8d, DWORD PTR [rdi+rax*4]
        inc     rax
        jmp     .L2
.L5:
        mov     eax, r8d
        ret

Am I correct in assuming that gcc -Og and -Os generates code on the form I described above?
Does anyone have a resource that describes the rationale for using while-form for -Og and -Os? Is it by design or an accidental fall-out form the way the optimization passes are organized.
I thought that converting loops into do-while form was part of the early canonicalization done by compilers? How come gcc -O0 generates do-while but gcc -Og gives while-loops? Do that canonicalization only happen when optimization is enabled?

Sidenote: I'm surprised by how much code generated with -Os and -O2 differ given that there aren't many compiler flags that are different. Maybe many passes checks some variable for tradeoff_speed_vs_space.

Also, nitpick on your pseudo-code syntax: shouldn't the condition at the bottom be written the same way for both cases that have it there? Also, the `while(test-expr));` should maybe be *instead* of `done`, not in the same scope as the loop body. I'd suggest using full C syntax because C has goto and `do{}while` — Peter Cordes, Sep 12 '20 at 14:45
Fun fact: `gcc` and `g++` are different at `-O0`: g++ makes craptastic loops with a `jmp` at the bottom and an `if()break` at the top, while `gcc` jumps to the condition of a `do{}while()` loop. (You can test on Godbolt by adding `-xc++` or `-xc` to compile as C++ or as C). Doesn't seem to make a difference at `-Og`, `-O1`, `-O2`, or `-Os` (which is a good thing; shows that the optimizing back-end re-formats the loop handed to it by the front-end, except at `-O0`) — Peter Cordes, Sep 12 '20 at 14:51
@PeterCordes: Ah, it's a `-O0` difference between `g++` and `gcc`. I've thought that it was input-dependent: Most of the time you get jump-to-middle, and sometimes you get the crappy `jmp` at bottom to `if()break` at top. I didn't make the connection between it being C++ specific. I guess I just inspect C code much more often than C++ code. Great to know! — Daniel Näslund, Sep 12 '20 at 15:05
Interesting question, though; I didn't expect if()break / jmp from `gcc -Os`; like you I rarely look at it. It looks like codegen choice changed in GCC4.8 vs. 4.7. Maybe they found that most loops didn't actually bottleneck on the front-end, or on back-end `jmp`/`jcc` execution throughput, and decided it was worth the tradeoff? (And Haswell and later can execute 2 branches per clock as long as one of them is not-taken, with an extra branch execution unit on one of the integer ALU ports). Or maybe it just happened because of some other change and it wasn't fully for its own sake. — Peter Cordes, Sep 12 '20 at 15:21

Intuition for understanding gcc -Og and -Os loop output

Background

Concrete examples

My question

0 Answers0