why is this dangling pointer in C?

Question

why ptr is dangling pointer. I know "ch" is out of scope, but address of ch is still valid out of inner block. And when I print *ptr I get correct value i.e. 5.

void main()
{
   int *ptr;
   
   {
       int ch = 5;
       ptr = &ch;
   } 
  
  
  printf("%d", *ptr);
}

"but address of ch is still valid out of inner block". No. Variable `ch` doesn't exist anymore after the block, so the compiler is free to put it in a place that isn't valid after the block. It could also realize that the address of `ch` is never used and leave `ch` in a register. Or even remove the whole block, since it's not used for anything. Of course nobody would think of using an address of a variable that doesn't exist anymore, no? — Costantino Grana, Jul 17 '21 at 07:13
How can ad address to an "out of scope" variable be valid? Makes no sense. — Support Ukraine, Jul 17 '21 at 07:44

Jabberwocky · Accepted Answer · 2021-07-17T18:52:38.267

ptr is a pointer to memory that doesn't belong to you anymore and that can contain anything including the value you expect. The 5 you see ist just a leftover, it could be overwritten anytime. What you see here is undefined behaviour.

In this simple case code generated by the compiler is most likely the same as the code the compiler would generate for this program (which is perfectly legal) and that's probably the reason you get 5:

void main()
{
   int *ptr;
   
   int ch = 5;
   ptr = &ch;
   
   printf("%d", *ptr);
}

Consider this case which is a bit more complicated:

int *foo()
{
    int ch = 5;
    return &ch;
}

void main()
{
  int* ptr = foo();

  printf("%d ", *ptr);
  printf("%d ", *ptr);
}

Here the output might be something like this:

5 45643

The first time you may get 5 because the memory has not yet been overwritten, the second time you get something else because in the meantime the memory has been overwritten.

Be aware that the output could be anything else or it could even crash, because this is undefined behaviour.

Also read this: Can a local variable's memory be accessed outside its scope?, this article is for C++ but it also applies to the C language.

"... it could even crash" Exactly. And (at least some versions of) gcc returns NULL instead of a normal address which cause the dereference to seg fault. — Support Ukraine, Jul 17 '21 at 07:54
@4386427 good to know that gcc returns a null pointer, it also issues a warning. — Jabberwocky, Jul 17 '21 at 08:21

AlexM · Answer 2 · 2021-07-18T15:51:00.273

You are experiencing this because you most likely didn't try to compile your code with optimizations turned on. When you do, you'll get an unpredicted behavior of your application's output, because of violating the semantics of scopes in C or C++.

If you don't use compile-time optimizations, you can still have some sort of predictability even if you break the semantical rules. This is because the compiler limits itself to generate the code in the order and with the logic that was written.

Once the optimizations will kick in, only the semantical rules of your programming language will continue to give you control and predictability over the resulted machine code. That's why in production code (where you almost always want optimizations turned on in release binaries), you will never try these academic hacks.

The longer explanation

The way the compiler manages the stack follows two types of contracts:

a strong contract - like the case of function calls between different binaries (like shared libraries), which is named caling convention (see here). Roughly speaking, this calling convention defines how the stack frame is managed when a function is called. This is a strong contract, because it will not change based on optimization settings, or other compiler settings, or even different versions of the compiler. Otherwise, the ABI will get broken.
a weak contract - like in the case of local variables within a function, a statement, or a compound statement or calls to functions that are only visible within a certain compile unit. There is no standard on how the compiler will manage the stack here. It can do whatever it wants, as long as it follows the semantics of that programming language and it will be a target for compile-time optimizations algorithms.

In your examples or mine's (see below), the semantics is broken: we define a compound statement, exit its scope but still keep (or use) some references to the memory used within that scope.

For example

Let's extend your example with this one and save it to local.c file:

int main(int argc, char * argv[]) {
    int *ptr1, *ptr2;

    {
        int ch = 5;
        ptr1 = &ch;
    } 
    {
        int ch = 10;
        ptr2 = &ch;
    }

    printf(
        "pointer1: %d\n"
        "pointer2: %d\n",
        *ptr1, *ptr2
    );
    
    return 0;
}

Now, let's use gcc and compile this in two different ways, to see what happens:

with optimizations disabled
with optimizations enabled

1. With optimizations disabled

# gcc local.c -O0 -o local; ./local
pointer1: 10
pointer2: 10

Well, we see that both ptr1 and ptr2 point to the exact location. This somehow makes sense because, after the first compound statement closes, the compiler uses its reserved space for the second statement. This is a behavior we do expect, once we define the scope with those compound statements by using the { and } brackets.

This is what you are experiencing with your example too. You are saving an address pointing to a stack location that the compiler knows it's free to be used as soon as it hits the closing bracket }. Your example, however, doesn't have an upcoming statement to see the effect in action.

2. With optimizations enabled

# gcc local.c -O1 -o local; ./local
pointer1: 0
pointer2: 0

Wait, what?

Yes, the same code produces two different outputs. With optimizations turned on, the behavior changes, and now the compiler decided to replace your code with something that is faster or smaller in size.

Experimenting with function stack frames

For fun, let's try the same with functions:

void fn_set() { char a = 5; printf("fn_set: a=%d\n", a); }
void fn_get() { char a    ; printf("fn_get: a=%d\n", a); }

int main(int argc, char * argv[]) {
    fn_set();
    fn_get();  
    return 0;
}

We expect fn_get to print 5, like in our previous example.

And let's test this again:

# gcc local.c -O0 -o local; ./local # without optimizations
fn_set: a=5
fn_get: a=5

# gcc local.c -O1 -o local; ./local # with optimizatins enabled
fn_set: a=5
fn_get: a=0

The result is the same. In theory, the function fn_get and fn_set have the same stack fingerprint. They should overlap nicely. In practice, there is no semantics or rule to bound to that, so the compiler optimizations remove the unnecessary code (like the unused variable a in fn_get) and go for their simplest/fastest version.