Why does (infinite) recursion give different results with and w/o -O3 in clang (and gcc/g++)?

Question

When I compile and run this code

#include <stdio.h>

int collatz(int a) {
    return a == 1 ? 1 : collatz((a%2) ? 3*a+1 : a/2);
}

int main() {
    for (int a = 0; a < 10; ++a)
        printf("%d: %d\n", a, collatz(a));
}

in clang (10.0.0-4ubuntu1) it crashes, but when I add -O3 it runs fine. (This holds both when passing this as C or C++ code to clang.)

I understand that it crashes without the optimization flag, since the call of collatz with 0 will lead to an infinite recursion. However, with -O3 it returns 0: 1, which is wrong. Is that a (known) bug in the optimization or am I missing something?

(gcc/g++ (9.3.0) also crash without optimization, and the compilation hangs with -O3.)

Edit: Just to add that I am not looking for the reason why the program hangs / fails (that is clear) but why the compiler apparently has the liberty to return some value whilst the program should not return (but enter an infinite loop).

Edit 2: Why haven't I picked a more minimal example? Well, here it is, this gives the same behavior

int divide(int i) {
  return i == 1 ? 1 : divide(i/2);
}

https://stackoverflow.com/questions/16436237/is-while1-undefined-behavior-in-c — KamilCuk, Aug 30 '21 at 12:21
FWIW, a little white space and a couple of if's gives you [this](https://wandbox.org/permlink/im25s0DaoWqMobDT) which guards against UB and IMHO is a lot easier to read. — NathanOliver, Aug 30 '21 at 12:26
Hmmm "infinite loop" vs. "infinite recursion", not quite the same. — chux - Reinstate Monica, Aug 30 '21 at 12:31
You mistakenly assume that a program with undefined behaviour should behave in some particular way — M.M, Aug 30 '21 at 12:34
I wouldn't expect an infinite loop to be undefined behavior? Infinite recursion may be different as you will, on any real computer, run into memory issues. But still crashing would be the more desirable / expected result. — fuenfundachtzig, Aug 30 '21 at 12:36
Sorry I removed by previous comment. It would be preferable to select one language tag, as the forward progress rule is different in C than in C++ — M.M, Aug 30 '21 at 12:40
@M.M: I didn't know it depends on that, so... apologies for putting both :) — fuenfundachtzig, Aug 30 '21 at 12:41
Hm, actually clang behaves the same for both C and C++ and gcc/g++ also behave the same, so I have this question for either language. — fuenfundachtzig, Aug 30 '21 at 12:44
See https://stackoverflow.com/a/66878066/1505939 . The C++ version of the rule is stronger (all threads must progress) — M.M, Aug 30 '21 at 12:44
Also https://stackoverflow.com/questions/18227093/infinite-recursion-in-c — William Pursell, Aug 30 '21 at 12:52
It's not that strange; `return 1` is the logical conclusion of "the only value this function ever returns is 1" and "assuming that every thread makes progress, this function must return a value at some point". If you move the recursion to a non-tail position, for instance with `1 + collatz(...)`, the result will probably be different. — molbdnilo, Aug 30 '21 at 12:56
@molbdnilo: Good point, I was wondering why it would pick `1` as result, but following your comment together with @eerorika's answer, it now actually makes sense. — fuenfundachtzig, Aug 30 '21 at 12:58
The only thing the C standard has to say about recursion is pretty much "Recursive function calls shall be permitted". Not very helpful. Infinite recursion isn't addressed by the standard, so it thereby sorts under undefined behavior. — Lundin, Aug 30 '21 at 13:49
That being said, clang can't even generate compliant C for `while(1)`, which _is_ well-defined... — Lundin, Aug 30 '21 at 13:50
Do not ask for C and C++ in a single question, even when they behave the same on a topic. They are 2 different languages. — 12431234123412341234123, Aug 30 '21 at 14:34

eerorika · Accepted Answer · 2021-08-30T12:48:23.673

6

Is that a (known) bug in the optimization or am I missing something?

It's not a bug in optimisation. It's a bug in the program that is being compiled. The behaviour of the program is undefined in C++.

it returns 0: 1, which is wrong

There is no "wrong" behaviour when the behaviour of the program is undefined.

but the compilation hangs with -O3

This is probably a compiler bug.

how can the compiler just decide to return something, whereas, if you follow the program logic, it should not.

When the behaviour of the program is undefined, the compiler can decide to do anything or to do nothing. There's nothing in particular that it should do or should not do. An undefined program has no logic to it.

why the behaviour is undefined

The C++ standard says:

[intro.progress]

The implementation may assume that any thread will eventually do one of the following:

terminate,

make a call to a library I/O function,

perform an access through a volatile glvalue, or

perform a synchronization operation or an atomic operation.

In case the argument is 0, the thread would never do any of those things, and as such the [language] implementation is allowed to assume that the function is never called with the argument of 0. When you call the function with such argument, you contradict this assumption and behaviour is undefined.

edited Aug 30 '21 at 12:48

answered Aug 30 '21 at 12:19

eerorika

232,697
12
197
326

This last point is actually my question: how can the compiler just decide to return something, whereas, if you follow the program logic, it should not. – fuenfundachtzig Aug 30 '21 at 12:32
2

@fuenfundachtzig The rule that allows the compiler to allow a value to be returned is the forward progress rule: https://timsong-cpp.github.io/cppwp/basic.exec#intro.progress-1 – NathanOliver Aug 30 '21 at 12:36
Interesting, thanks. I was expecting that the title of my question would not make sense any more once I have the answer :) I'll change it. – fuenfundachtzig Aug 30 '21 at 12:55
Even if the compiler is free to do what-ever if the program triggers UB, I would still view this as a bug from a usability point-of-view. The compiler did figure out that collatz(0) is UB, and also silently inlined collatz(0) - ignoring that it was known to be UB. It seems that this was changed in a later release. – Hans Olsson Aug 30 '21 at 13:55
Where does the C++ standard say that contradicting an assumption, as opposed to violating a constraint, renders the behavior undefined? On the face of it, the standard is granting the implementation license to implement certain behavior (the behavior that results from assuming the code does one of the listed things) and is not prohibiting the program from containing code that nominally loops forever. – Eric Postpischil Aug 30 '21 at 23:26
@EricPostpischil Standard also doesn't say that you're prohibited from dividing by zero. It just says that behaviour is undefined in that case. I don't think standard defines the meaning of the word "assume". We're left with having to interpret through its meaning in English. If the implementation is allowed to assume that the program will never be in a certain state, then it doesn't matter how the program behaves because that state will never be (assumed to be) entered. – eerorika Aug 31 '21 at 00:05
@EricPostpischil the C++ standard doesn't have "constraints", that is C standard terminology – M.M Aug 31 '21 at 00:06
@EricPostpischil For what it's worth, in the definition of UB, the standard notes: `Undefined behavior may be expected when this document omits any explicit definition of behavior or ...` The standard doesn't specify what happens when assumptions are broken, so UB may be expected `... or when a program uses an erroneous construct or erroneous data` I'd say that breaking assumptions is an error. – eerorika Aug 31 '21 at 00:09
I think you are reaching. The program is not violating an assumption because the assumption is not stated as any sort of restriction on the program. It is just a license granted to the implementation. And that license is to treat the program as if it eventually exits the loop, not as if its behavior is entirely undefined. – Eric Postpischil Aug 31 '21 at 00:18
@EricPostpischil I don't think I've said that assumption is any sort of restriction. `It is just a license granted to the implementation.` Indeed; like UB. – eerorika Aug 31 '21 at 01:37
No, it does not license the implementation to treat the behavior as undefined. It licenses the implementation to assume one of the listed events occurs. The license is limited. – Eric Postpischil Aug 31 '21 at 01:48
@EricPostpischil Of course. The behaviour is defined if the assumption is satisfied. – eerorika Aug 31 '21 at 01:49

Tzig · Answer 2 · 2021-08-30T13:09:11.627

Note: my answer is only true in C. As said in other answers in C++ the program has UB in it. This is one of the reason you shouldn't compile C code with a C++ compiler.

I tried to get the assembly generated by clang 10.0.0 through godbolt. I understand not everyone knows how to read assembly so I'll try to explain succintly what we can deduce from the listings. Here's what I generated:

With clang -S main.c:

collatz(int):
        push    {r11, lr}
        mov     r11, sp
        sub     sp, sp, #16
        str     r0, [r11, #-4]
        ldr     r0, [r11, #-4]
        cmp     r0, #1
        bne     .LBB0_2
        b       .LBB0_1
.LBB0_1:
        mov     r0, #1
        str     r0, [sp, #8]            @ 4-byte Spill
        b       .LBB0_6
.LBB0_2:
        ldr     r0, [r11, #-4]
        add     r1, r0, r0, lsr #31
        bic     r1, r1, #1
        sub     r0, r0, r1
        cmp     r0, #0
        beq     .LBB0_4
        b       .LBB0_3
.LBB0_3:
        ldr     r0, [r11, #-4]
        add     r0, r0, r0, lsl #1
        add     r0, r0, #1
        str     r0, [sp, #4]            @ 4-byte Spill
        b       .LBB0_5
.LBB0_4:
        ldr     r0, [r11, #-4]
        add     r0, r0, r0, lsr #31
        asr     r0, r0, #1
        str     r0, [sp, #4]            @ 4-byte Spill
        b       .LBB0_5
.LBB0_5:
        ldr     r0, [sp, #4]            @ 4-byte Reload
        bl      collatz(int)
        str     r0, [sp, #8]            @ 4-byte Spill
        b       .LBB0_6
.LBB0_6:
        ldr     r0, [sp, #8]            @ 4-byte Reload
        mov     sp, r11
        pop     {r11, lr}
        bx      lr
main:
        push    {r11, lr}
        mov     r11, sp
        sub     sp, sp, #16
        mov     r0, #0
        str     r0, [r11, #-4]
        str     r0, [sp, #8]
        b       .LBB1_1
.LBB1_1:                                @ =>This Inner Loop Header: Depth=1
        ldr     r0, [sp, #8]
        cmp     r0, #9
        bgt     .LBB1_4
        b       .LBB1_2
.LBB1_2:                                @   in Loop: Header=BB1_1 Depth=1
        ldr     r0, [sp, #8]
        str     r0, [sp, #4]            @ 4-byte Spill
        bl      collatz(int)
        ldr     r1, .LCPI1_0
        str     r0, [sp]                @ 4-byte Spill
        mov     r0, r1
        ldr     r1, [sp, #4]            @ 4-byte Reload
        ldr     r2, [sp]                @ 4-byte Reload
        bl      printf
        b       .LBB1_3
.LBB1_3:                                @   in Loop: Header=BB1_1 Depth=1
        ldr     r0, [sp, #8]
        add     r0, r0, #1
        str     r0, [sp, #8]
        b       .LBB1_1
.LBB1_4:
        ldr     r0, [r11, #-4]
        mov     sp, r11
        pop     {r11, lr}
        bx      lr
.LCPI1_0:
        .long   .L.str
.L.str:
        .asciz  "%d: %d\n"

Here, things look pretty normal, the function is defined, its behavior looks okay, the loop is formed correctly in the main so as we can expect, when it calls collatz(0), the stack overflows.

With clang -S -O3 main.c:

collatz(int):
        mov     r0, #1
        bx      lr
main:
        push    {r4, r10, r11, lr}
        add     r11, sp, #8
        ldr     r4, .LCPI1_0
        mov     r1, #0
        mov     r2, #1
        mov     r0, r4
        bl      printf
        mov     r0, r4
        mov     r1, #1
        mov     r2, #1
        bl      printf
        mov     r0, r4
        mov     r1, #2
        mov     r2, #1
        bl      printf
        mov     r0, r4
        mov     r1, #3
        mov     r2, #1
        bl      printf
        mov     r0, r4
        mov     r1, #4
        mov     r2, #1
        bl      printf
        mov     r0, r4
        mov     r1, #5
        mov     r2, #1
        bl      printf
        mov     r0, r4
        mov     r1, #6
        mov     r2, #1
        bl      printf
        mov     r0, r4
        mov     r1, #7
        mov     r2, #1
        bl      printf
        mov     r0, r4
        mov     r1, #8
        mov     r2, #1
        bl      printf
        mov     r0, r4
        mov     r1, #9
        mov     r2, #1
        bl      printf
        mov     r0, #0
        pop     {r4, r10, r11, lr}
        bx      lr
.LCPI1_0:
        .long   .L.str
.L.str:
        .asciz  "%d: %d\n"

clang is completely drunk here. It did optimize the loop as we would expect but it decided that collatz(0) = 1 for whatever reason and it compiled a very weird version of collatz only to never call it. You may say clang is a C++ compiler so it makes sense for it to do unexpected things but that's not true, clang advertise itself as "the Clang C, C++, and Objective-C compiler". To me, it's a compiler bug. Let's try with clang 12.0.1 just in case the bug was solved in a later version.

With the latest version clang -S -O3 main.c:

    .text
    .file   "main.c"
    .globl  collatz                         # -- Begin function collatz
    .p2align    4, 0x90
    .type   collatz,@function
collatz:                                # @collatz
    .cfi_startproc
# %bb.0:
                                        # kill: def $edi killed $edi def $rdi
    cmpl    $1, %edi
    jne .LBB0_2
    jmp .LBB0_5
    .p2align    4, 0x90
.LBB0_3:                                #   in Loop: Header=BB0_2 Depth=1
    leal    (%rdi,%rdi,2), %edi
    addl    $1, %edi
    cmpl    $1, %edi
    je  .LBB0_5
.LBB0_2:                                # =>This Inner Loop Header: Depth=1
    testb   $1, %dil
    jne .LBB0_3
# %bb.4:                                #   in Loop: Header=BB0_2 Depth=1
    movl    %edi, %eax
    shrl    $31, %eax
    addl    %edi, %eax
    sarl    %eax
    movl    %eax, %edi
    cmpl    $1, %edi
    jne .LBB0_2
.LBB0_5:
    movl    $1, %eax
    retq
.Lfunc_end0:
    .size   collatz, .Lfunc_end0-collatz
    .cfi_endproc
                                        # -- End function
    .globl  main                            # -- Begin function main
    .p2align    4, 0x90
    .type   main,@function
main:                                   # @main
    .cfi_startproc
# %bb.0:
    .p2align    4, 0x90
.LBB1_1:                                # =>This Inner Loop Header: Depth=1
    jmp .LBB1_1
.Lfunc_end1:
    .size   main, .Lfunc_end1-main
    .cfi_endproc
                                        # -- End function
    .ident  "clang version 12.0.1"
    .section    ".note.GNU-stack","",@progbits
    .addrsig

Here, the main function consists of a while(1) loop, which is what we expect given the input.

To conclude, I think both gcc and clang 10.0.0 have a bug, though one is easier to see as a bug than the other

It's not a bug for a program with undefined behaviour to produce unexpected behaviour — M.M, Aug 30 '21 at 12:48
@M.M how is the behavior undefined though? Recursion (infinite or otherwise) is defined in C [there's a very good question about it here](https://stackoverflow.com/questions/18227093/infinite-recursion-in-c) , as is 0%2 — Tzig, Aug 30 '21 at 12:56
Note that things such as stack overflows aren't undefined behavior in C as they *can't* exist in C. The C language bases itself on a theoretical machine with infinite memory, in such a machine infinite recursion is sensibly the same as a while(1) loop and poses no problem. — Tzig, Aug 30 '21 at 12:59
The standard says that loops must progress (see the top answer or https://stackoverflow.com/a/66878066/1505939) — M.M, Aug 30 '21 at 22:42
@M.M: The text you quote in that answer says certain iteration statements may be assumed to terminate. The only iteration statement in the code in the question is the `for` loop, and it does not qualify because it performs output operations. — Eric Postpischil, Aug 30 '21 at 23:23
@EricPostpischil No output operation is performed by the loop, since there is no code path in which evaluation of the arguments for `printf` completes — M.M, Aug 31 '21 at 00:09
@M.M: If the loop terminates, it had to go through the `printf`, and in fact the generated code does do the `printf`. The `for` loop is not the one the compiler applied the “may assume terminates” rule to. The loop caused by the recursion is the one the compiler (erroneously) applied the rule to. The behavior observed here is, for C, a compiler defect, and the program does not have undefined behavior in this regard. — Eric Postpischil, Aug 31 '21 at 00:23

Why does (infinite) recursion give different results with and w/o -O3 in clang (and gcc/g++)?

2 Answers2