1

In the following code snippet shouldn't str_s should point to some location in stack.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

char* fun_s(){
    char str[8]="vikash";
    printf("s :%p\n", str);
    return str;
}

char* fun_h(){
    char* str = (char*)malloc(8);
    printf("h :%p\n", str);
    strcpy(str, "vikash");
    return str;
}

int main(){
    char* str_s = fun_s();
    char* str_h = fun_h();
    printf("s :%p\nh :%p\n", str_s, str_h);
    return 0;
}

I understand that there is problem in return of fun_s and content of this pointer can't be trusted, but as per my understanding it should point to some location in stack not zero? I get following output in my console. Can you please explain why third line prints (nil) not 0x7ffce7561220

s :0x7ffce7561220
h :0x55c49538d670
s :(nil)
h :0x55c49538d670

GCC Version

gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

OS :Ubuntu 18.04.3 LTS

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
chandola
  • 23
  • 4
  • Reverse the order of calls (use `fun_h` first). Ultimately you can compile-to-asm and/or asm-debug this code. It's short, and pretty simple. At least you *appear* to understand `str_s` hosts a dangling pointer, which is more than most people that write code like this. Fwiw, I can't repro this, so kindly include your toolchain and host platform in your question. – WhozCraig Dec 15 '19 at 11:51
  • @lurker I agree that content will be smashed but should location that str_s pointing to should remain same? This is the location that i am trying to print. Please correct me if I am wrong. – chandola Dec 15 '19 at 11:53
  • 2
    Might be the effect of undefined behaviour? The compiler notices it’s illegal and is allowed to do anything at all. – Ry- Dec 15 '19 at 11:54
  • @WhozCraig Output remains same. I have added gcc version – chandola Dec 15 '19 at 11:54
  • I guess GCC returns NULL to protect you from UB. Check for warning message for `fun_s`. – i486 Dec 15 '19 at 12:01
  • @chandola Learned something new about gcc. That's amazing. Sure enough , there it in the asm: `mov eax, 0` before the ret. Wow. That was without optimization. The entire function call gets tossed at -O2. – WhozCraig Dec 15 '19 at 12:03
  • @WhozCraig mov $0x0,%eax, Is this what you are pointing to? – chandola Dec 15 '19 at 12:05
  • @chandola Yeah, different asm format, same result. – WhozCraig Dec 15 '19 at 12:06
  • 2
    I believe this is legal conversion. Since dereferencing str after leaving the function would be UB and the compiler if allowed to assume no UB, it can assume the returned pointer will never be dereferenced. So it can return whatever it wants as long as it cannot compare equal to any valid pointer to same type. We'd need a language lawyer to confirm. – spectras Dec 15 '19 at 12:06
  • @spectras: That has since been asked and answered: [Is it UB to return a pointer to local variable?](https://stackoverflow.com/q/66606248) no, but the pointer value is "indeterminate" when the pointed-to object's lifetime has ended. – Peter Cordes Oct 16 '21 at 11:15

2 Answers2

3

Your compiler is purposely injecting a null return value from that function. I don't have gcc 7.4 available, but I do have 7.3, and I assume the result is similar:

Compiling fun_s to assembly delivers this:

.LC0:
        .string "s :%p\n"
fun_s:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        movabs  rax, 114844764957046
        mov     QWORD PTR [rbp-8], rax
        lea     rax, [rbp-8]
        mov     rsi, rax
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf
        mov     eax, 0 ; ======= HERE =========
        leave
        ret

Note the hard-set of zero to eax, which will hold the resulting pointer when returning back to the caller.

Making str static delivers this:

.LC0:
        .string "s :%p\n"
fun_s:
        push    rbp
        mov     rbp, rsp
        mov     esi, OFFSET FLAT:str.2943
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf
        mov     eax, OFFSET FLAT:str.2943
        pop     rbp
        ret

In short, your compiler is detecting the local address return and rewriting it to be NULL. In doing so, it is preventing any later nefarious use of said-address (ex: a content injection attack).

I see no reason the compiler should not be allowed to do this. I'm sure a language purist will confirm or reject that suspicion.

WhozCraig
  • 65,258
  • 11
  • 75
  • 141
  • Thanks @WhozCraig. As mentioned by P__J__ . This is change introduced with newer gcc. Old gcc still prints the addresses however newer ones return NULL – chandola Dec 15 '19 at 12:19
  • As pointer derefencing is undefined behaviour, I guess the compiler is totally allowed to define *a* behaviour. That one looks like the lesser evil. – Matthieu Dec 15 '19 at 12:19
  • In real life circumstances this feature will definitely be helpful and being undefined by C standard gcc people are free to take it way they want. But I feel like C super powers are taken away with such things in GCC. So i would rather like to give a dangling pointer than null. – chandola Dec 15 '19 at 12:24
  • The standard says "The value of a pointer becomes indeterminate when the object it points to reaches the end of its lifetime", so this behavior is allowed. – interjay Dec 15 '19 at 12:30
  • I completely concur with interjay. If that is how the language standard reads the compiler could jam the lead-engineers ex-wife's phone number into that thing. That value is just-as-indeterminate as any other. Ultimately you cannot rely on this behavior; they just made it easier to detect at runtime. – WhozCraig Dec 15 '19 at 12:33
  • C have always allowed to stab oneself to death if one wants to. This behavior will safeguard that a bit. – chandola Dec 15 '19 at 12:57
2

Most modern compilers detect return of the pointer to local variable and actually return NULL (generally speaking are more aggressive regarding the UBs and the approach is - let program fail ASAP or "make the UB possible to detect runtime" as it is in this case) https://godbolt.org/z/pDUXmm

0___________
  • 60,014
  • 4
  • 34
  • 74
  • That's correct, old compilers do print addresses as i had mentioned in question. – chandola Dec 15 '19 at 12:16
  • If the compiler were deliberately detecting the return of a pointer to a local object, it should be issuing a warning, not just substituting a null pointer. Can you substantiate the claim that most modern compilers do this? – Eric Postpischil Dec 15 '19 at 13:25
  • 1
    You can try link shared by P__J__. There are multiple compilers to try out there. – chandola Dec 15 '19 at 13:49
  • @EricPostpischil: GCC9.2 -O2 *does* also warn, even without `-Wall` (like current ICC, clang, and MSVC), as you can see in the Godbolt compiler-explorer link in this answer. (Hopefully you were replying to the OP, telling them to look for compiler warnings.) Only gcc returns NULL, though; the rest do return to a local `char[]` array, not trying to break callers that deref it by returning NULL. https://godbolt.org/z/jTPxv6KhW. – Peter Cordes Oct 16 '21 at 11:11
  • Note that it's not UB to just return it without deref. However, [the pointer value "becomes indeterminate"](https://stackoverflow.com/q/66606248) past the lifetime of the pointed-to object, so what GCC's doing is legal. Only unhelpful if you're trying to find out the approximate value of the stack pointer this way. – Peter Cordes Oct 16 '21 at 11:14