3

When I run the following program, it always prints "yes". However when I change SOME_CONSTANT to -2 it always prints "no". Why is that? I am using visual studio 2019 compiler with optimizations disabled.

#define SOME_CONSTANT -3

void func() {
    static int i = 2;
    int j = SOME_CONSTANT;
    i += j;
}

void main() {
    if (((bool(*)())func)()) {
        printf("yes\n");
    }
    else {
        printf("no\n");
    }
}

EDIT: Here is the output assembly of func (IDA Pro 7.2):

sub     rsp, 18h
mov     [rsp+18h+var_18], 0FFFFFFFEh
mov     eax, [rsp+18h+var_18]
mov     ecx, cs:i
add     ecx, eax
mov     eax, ecx
mov     cs:i, eax
add     rsp, 18h
retn

Here is the first part of main:

sub     rsp, 628h
mov     rax, cs:__security_cookie
xor     rax, rsp
mov     [rsp+628h+var_18], rax
call    ?func@@YAXXZ    ; func(void)
test    eax, eax
jz      short loc_1400012B0

Here is main decompiled:

int __cdecl main(int argc, const char **argv, const char **envp)
{
  int v3; // eax

  func();
  if ( v3 )
    printf("yes\n");
  else
    printf("no\n");
  return 0;
}
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 8
    Short answer: Undefined behavior. Longer answer: [Undefined, unspecified and implementation-defined behavior](https://stackoverflow.com/questions/2397984/undefined-unspecified-and-implementation-defined-behavior). – Algirdas Preidžius Jun 03 '20 at 09:19
  • @AlgirdasPreidžius But it *always* prints "yes" and "no" respectively. –  Jun 03 '20 at 09:20
  • 12
    Undefined behavior is undefined. The program might as well erase your hard drive – UnholySheep Jun 03 '20 at 09:21
  • 2
    @super Doesn't matter. If it invokes undefined behavior - anything can happen. – Algirdas Preidžius Jun 03 '20 at 09:21
  • 3
    this c++ code does not make much sense. You need to look at the assembly to understand what is going on – 463035818_is_not_an_ai Jun 03 '20 at 09:22
  • 1
    The assembly of main is where you would see what is happening – Gerhard Jun 03 '20 at 09:24
  • I am not fluent at assembly but I suppose also `main` is required to get the full picture – 463035818_is_not_an_ai Jun 03 '20 at 09:26
  • 1
    `eax` is used for return values in some conventions, and it happens to keep the last `i` value when `func` exits. – bereal Jun 03 '20 at 09:27
  • 4
    It might be interesting to speculate about UB personally, but it is not for SO. Check the assembly to see what your compiler did, and check its source code to see why it might've done that. Also, `main()` returns `int`. – underscore_d Jun 03 '20 at 09:27
  • 1
    Most likely there `bool` is returned in a register. And the function makes use of the same register for the value computation, before assigning to `i` at its location. The function returns with the "correct" register holding a value. – StoryTeller - Unslander Monica Jun 03 '20 at 09:29
  • @super By telling your compiler to return `void`, you have yet another kind of UB. Hence the code can do anything, and it's not fruitful to discuss. – underscore_d Jun 03 '20 at 09:30
  • @underscore_d not sure if it is "not for SO", rather it is not so much for the `C++` tag but rather for `assembly` – 463035818_is_not_an_ai Jun 03 '20 at 09:31
  • 1
    This question was tagged with `undefined behavior` by OP for a reason. –  Jun 03 '20 at 09:31
  • So function uses register eax for its computation results. Main uses eax in the if to determine if it needs to jump. So the logic of the if follows what happens in the function. – Gerhard Jun 03 '20 at 09:32
  • 1
    It's not a bug, it's a feature :D – Hack06 Jun 03 '20 at 09:35
  • @PeterCordes what uninitialized variable? – 463035818_is_not_an_ai Jun 03 '20 at 09:41
  • 1
    @idclev463035818: Oops, fixed. I had been looking at the decompiled code block, not the original source. That makes more sense; a debug build would probably have loaded from memory when reading a variable even if it was uninitialized, and thus would have read it as `0xcccccccc` because MSVC debug mode poisons that stack frame with `0xcc` bytes, exactly so that read-uninitialized is easy to detect. – Peter Cordes Jun 03 '20 at 09:45
  • Read Lattner's [blog on UB](http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html) – Basile Starynkevitch Jun 03 '20 at 10:24

3 Answers3

7
((bool(*)())func)()

This expression takes a pointer to func, casts the pointer to a different type of function, then invokes it. Invoking a function through a pointer-to-function whose function signature does not match the original function is undefined behavior which means that anything at all might happen. From the moment this function call happens, the behavior of the program cannot be reasoned about. You cannot predict what will happen with any certainty. Behavior might be different on different optimization levels, different compilers, different versions of the same compiler, or when targeting different architectures.

This is simply because the compiler is allowed to assume that you won't do this. When the compiler's assumptions and reality come into conflict, the result is a vacuum into which the compiler can insert whatever it likes.

The simple answer to your question "why is that?" is, quite simply: because it can. But tomorrow it might do something else.

cdhowie
  • 158,093
  • 24
  • 286
  • 300
  • 1
    _From the moment this function call happens, the behavior of the program cannot be reasoned about._ It's worse than that even, [undefined behavior can cause time travel](https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=633) (by Raymond Chen). – Eljay Jun 03 '20 at 11:35
5

What apparently happened is:

mov     ecx, cs:i
add     ecx, eax
mov     eax, ecx   ; <- final value of i is stored in eax
mov     cs:i, eax  ; and then also stored in i itself

Different registers could have been used, it just happened to work this way. There is nothing about the code that forces eax to be chosen. That mov eax, ecx is really redundant, ecx could have been stored straight to i. But it happened to work this way.

And in main:

call    ?func@@YAXXZ    ; func(void)
test    eax, eax
jz      short loc_1400012B0

rax (or part of it, like eax or al) is used for the return value for integer-ish types (such as booleans) in the WIN64 ABI, so that makes sense. That means the final value of i happens to be used as the return value, by accident.

harold
  • 61,398
  • 6
  • 86
  • 164
  • This looks like (partly?) un-optimized compiler output, otherwise even MSVC would be able to optimize away most or all of the `func` function body. So that explains the redundant `mov eax, ecx`; the compiler wasn't even trying. Strange that it didn't set up RBP as a frame pointer in a debug build, though. – Peter Cordes Jun 03 '20 at 09:36
  • 1
    @PeterCordes it is unoptimized as OP indicates, [here](https://godbolt.org/z/MK-joC) it is reproduced (the optimized version subtract straight from `i` without polluting any registers) – harold Jun 03 '20 at 09:39
3

I always get printed out no, so it must be dependent from compiler to compiler, hence the best answer is UB (Undefined Behavior).

Hack06
  • 963
  • 1
  • 12
  • 20