-1

I wrote this code and found that it acts differently with different versions of gcc.

The source code,

#include<stdio.h>

int *fun();

int main(int argc, char *argv[])
{
    int *ptr;

    ptr = fun();

    printf("%x", *ptr);
}

int *fun()
{
    int *ptr;
    int foo = 0xdeadbeef;
    ptr = &foo;

    return ptr;
}

The code is wrong. After execution of fun(), the local variable foo is released and doesn't exist. But the main function tries to use it, so it will lead segmentation fault.

But I tried the same code on three versions of gcc and they act differently.

In 10.2.0

╭─    ~ ································································ ✔ ─╮
╰─ gcc -v | bin/pbcopy                                                            ─╯
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /build/gcc/src/gcc/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++,d --with-isl --with-linker-hash-style=gnu --with-system-zlib --enable-__cxa_atexit --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-install-libiberty --enable-linker-build-id --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --disable-libssp --disable-libstdcxx-pch --disable-libunwind-exceptions --disable-werror gdc_include_dir=/usr/include/dlang/gdc
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 10.2.0 (GCC)


╭─    ~ ································································ ✔ ─╮
╰─ gcc a.c && a.out                                                               ─╯
deadbeef%                                                                            

It prints deadbeef.

Its assembly code:

(gdb) disassemble fun 
Dump of assembler code for function fun:
   0x000000000000119d <+23>:    movl   $0xdeadbeef,-0x14(%rbp)
   0x00000000000011a4 <+30>:    lea    -0x14(%rbp),%rax
   0x00000000000011a8 <+34>:    mov    %rax,-0x10(%rbp)
   0x00000000000011ac <+38>:    mov    -0x10(%rbp),%rax
   0x00000000000011b0 <+42>:    mov    -0x8(%rbp),%rdx
   0x00000000000011b4 <+46>:    sub    %fs:0x28,%rdx
   0x00000000000011bd <+55>:    je     0x11c4 <fun+62>
   0x00000000000011bf <+57>:    call   0x1030 <__stack_chk_fail@plt>
   0x00000000000011c4 <+62>:    leave  
   0x00000000000011c5 <+63>:    ret    
End of assembler dump.

(gdb) disass main

   0x000000000000116c <+35>:    mov    %eax,%esi
   0x000000000000116e <+37>:    lea    0xe8f(%rip),%rdi        # 0x2004
   0x0000000000001175 <+44>:    mov    $0x0,%eax
   0x000000000000117a <+49>:    call   0x1040 <printf@plt>

Assembly code shows the function stores 0xdeadbeef in %rax, and printf receives it as %esi, so it prints 0xdeadbeef.

In 9.3.0:

coolder@ASUS:~$ gcc -v                                                          [1/1]
Using built-in specs.                                                                
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/9/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:hsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 9.3.0-15' --with-bugurl=file:///usr/share/doc/gcc-9/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-9 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-9-0xEOmg/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-mutexThread model: posix
gcc version 9.3.0 (Debian 9.3.0-15)


coolder@ASUS:~$ gcc a.c && ./a.out
a.c: In function ‘fun’:
a.c:14:9: warning: function returns address of local variable [-Wreturn-local-addr]
   14 |  return &a;
      |         ^~
0coolder@ASUS:~$ 

It prints 0.

Its assembly code,

(gdb) disassemble fun
Dump of assembler code for function fun:
   0x000055555555515e <+0>:     push   %rbp
   0x000055555555515f <+1>:     mov    %rsp,%rbp
   0x0000555555555162 <+4>:     movl   $0xdeadbeef,-0x4(%rbp)
   0x0000555555555169 <+11>:    mov    $0x0,%eax
   0x000055555555516e <+16>:    pop    %rbp
(gdb) disass main
   0x000055555555513e <+9>:     call   0x55555555515e <fun>
   0x0000555555555143 <+14>:    mov    %rax,%rsi
   0x0000555555555146 <+17>:    lea    0xeb7(%rip),%rdi        # 0x555555556004
   0x000055555555514d <+24>:    mov    $0x0,%eax

Assembly code shows it moves 0 to %eax, and printf uses %eax as %rsi, so it prints 0.

In 5.4.1

➜  ~ gcc a.c && ./a.out 
a.c: In function ‘fun’:
a.c:17:9: warning: function returns address of local variable [-Wreturn-local-addr]
  return &a;
         ^
[1]    3566 segmentation fault (core dumped)  ./a.out

It gets segmentation fault, as I expected.

Its assembly code,

(gdb) disassemble fun 
Dump of assembler code for function fun:
   0x08048448 <+0>:     push   %ebp
   0x08048449 <+1>:     mov    %esp,%ebp
   0x0804844b <+3>:     sub    $0x10,%esp
   0x0804844e <+6>:     movl   $0xdeadbeef,-0x4(%ebp)
   0x08048455 <+13>:    mov    $0x0,%eax
   0x0804845a <+18>:    leave  
   0x0804845b <+19>:    ret   

(gdb) disass main
   0x0804841d <+17>:    call   0x8048448 <fun>
   0x08048422 <+22>:    mov    %eax,-0xc(%ebp)
   0x08048425 <+25>:    mov    -0xc(%ebp),%eax
   0x08048428 <+28>:    mov    (%eax),%eax

Assembly code shows that it moves 0x0 to %eax, and main tries to refer %eax, so this leads to segmentation fault.

So why the assembly code is so different?

Any help will be appreciated.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
coolder
  • 144
  • 2
  • 9
  • 7
    Your code has undefined behaviour. The compiler is free to generate whatever code he wants to do. In your case two versions of the compiler decide differently. Compare https://stackoverflow.com/questions/2397984/undefined-unspecified-and-implementation-defined-behavior. – Werner Henze Mar 13 '21 at 15:39
  • 3
    When `fun` returns, `foo` goes out of scope, so any pointers to `foo` become invalid. If you try to dereference such a pointer, you're trying to access a stack frame that is no longer active. So anything can happen. Short answer: Don't do it. – Tom Karzes Mar 13 '21 at 15:41
  • Does this answer your question? [Undefined, unspecified and implementation-defined behavior](https://stackoverflow.com/questions/2397984/undefined-unspecified-and-implementation-defined-behavior) – n. m. could be an AI Mar 13 '21 at 15:44
  • 1
    Add `-O1` or higher and you will also get 0 with gcc-10. – Marc Glisse Mar 13 '21 at 16:35

1 Answers1

4

Returning the address of a local variable and trying to access it after its lifetime is over is undefined behavior, rationalizing what happens under the hood is a fool's errand because there are no standard rules to be followed (appart, of course, from the aforementioned and linked UB rules), it's quite common different compiler versions changing the way a situation like this is dealt with.

anastaciu
  • 23,467
  • 7
  • 28
  • 53
  • 1
    Thank you. You help me a lot. – coolder Mar 13 '21 at 15:47
  • 1
    @coolder: For some other examples of compile-time-visible UB, modern GCC and clang will sometimes emit `ud2` (illegal instruction) on that path of execution, or just leave off emitting any code for that path (including the ret) so execution just falls into whatever is next. Because literally anything is allowed to happen in a program that encounters UB. – Peter Cordes Mar 14 '21 at 02:00
  • @PeterCordes: Note that returning the address of an automatic object would not invoke UB if the recipient of the pointer never does anything with the return value beyond possibly storing the return value into a pointer object whose value is never used. If a call to the function is expanded in-line, and the calling code makes use of the returned value, a compiler might usefully recognize that some paths would definitely access an indeterminate pointer and replace the accesses with traps, but the return from the function would otherwise have defined behavior. – supercat Mar 14 '21 at 19:26
  • @supercat: The other kinds of UB where you can get UD2 or no-ret include falling off the end of a non-void function *in C++* where it's UB on the spot. (In C, it's only UB if the caller uses the return value). Yes, for *this* case the function has to work as long as the return value isn't dereferenced. GCC optimizes it to `xor eax,eax` instead of `mov rax,rsp` both for performance and to make failure noisy if you *do* deref. Or with optimization if this function inlines into a caller that does deref, then we have truly compile-time-visible UB and a UD2 is totally possible. – Peter Cordes Mar 15 '21 at 01:23