4

I want to know the way variables are initialized :

#include <stdio.h>
int main( void )
{
    int ghosts[3];
    for(int i =0 ; i < 3 ; i++)
    printf("%d\n",ghosts[i]);
    return 0;
}

this gets me random values like -12 2631 131 .. where did they come from?

For example with GCC on x86-64 Linux: https://godbolt.org/z/MooEE3ncc

I have a guess to answer my question, it could be wrong anyways:
The registers of the memory after they are 'emptied' get random voltages between 0 and 1, these values get 'rounded' to 0 or 1, and these random values depend on something?! Maybe the way registers are made? Maybe the capacity of the memory comes into play somehow? And maybe even the temperature?!!

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Yahia
  • 65
  • 6
  • 6
    They come from the memory this array is occupying. Using uninitialized variables in C is undefined behavior. – Eugene Sh. Feb 25 '22 at 15:25
  • They're not mathematically random, they are "random" in the sense that they can be anything (including your bank balance if you connected to your bank in the previous 10 minutes (your bank balance is not random, is it?)) – pmg Feb 25 '22 at 15:30
  • @EugeneSh. Not necessarily. It's only UB if the value in question happens to be a trap representation or if the object in question never had its address taken. I've yet to come across an implementation with trap representation for an `int`, and elements of an array have to have their address taken to access them. – dbush Feb 25 '22 at 15:31
  • @dbush I remember this discussion several times here, and I am not sure what was the definitive conclusion. only remember many arguments. FWIW, the C standard, Annex J2 is listing "*The value of an object with automatic storage duration is used while it is indeterminate*" as UB, but one of the claims was that it is not normative. – Eugene Sh. Feb 25 '22 at 15:34
  • I have a guess to answer my question , it could be wrong anyways : the registres of the memory after they are 'emptied' get random voltages between 0 and 1 , these values get 'rounded'to 0 or 1, and these random values depend on something ?! maybe the way registres are made ? maybe the capacity of the memory comes into play somehow? and maybe even the temperature ?!! – Yahia Feb 25 '22 at 15:46
  • 1
    @EugeneSh. I drew my conclusion from 6.3.2.1p2: *"If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined."* – dbush Feb 25 '22 at 15:47
  • There is no memory "emptied". The memory holds the value that just happens to be there when your function is called. There is no special treatment applied to that memory. It is up to you to provide initial values. – Gerhardh Feb 25 '22 at 15:51
  • @EugeneSh. Found a thread on this: https://stackoverflow.com/questions/11962457/why-is-using-an-uninitialized-variable-undefined-behavior – dbush Feb 25 '22 at 16:10
  • sometimes it is the last value stared in that place, which is why for security bzero is used; the value could just be what ever random "stray" charge is/was there. – Andrew Feb 25 '22 at 16:48
  • 2
    Yahia: Your computer doesn't reboot or power cycle every time you run a new program. Every bit of storage in memory or registers your program can use has a value left there by some previous instruction, either in this program or in the OS before it started this program. (Modern OSes zero memory and registers to avoid information leaks of kernel data and data from other processes; @pmg's suggestion of seeing your bank balance is not plausible in real life unless you put this code inside a modified version of firefox or chromium.) – Peter Cordes Feb 26 '22 at 00:35
  • @Yahia: You should edit that guess into your question, if that's the kind of angle you're approaching this question from. (It's not at all correct except for some systems on power-up, but it makes it clearer what kind of answer you're looking for.) Actually I might as well edit for you. – Peter Cordes Feb 26 '22 at 00:38
  • in practice on a real Itanium CPU you'll get segfault because of the [NaT](https://devblogs.microsoft.com/oldnewthing/20150727-00/?p=90821) thing. [Is it undefined behavior to return an uninitialized, ultimately unused, struct?](https://stackoverflow.com/q/50751665/995714), [(Why) is using an uninitialized variable undefined behavior?](https://stackoverflow.com/q/11962457/995714) – phuclv Feb 26 '22 at 01:26
  • if you think about it you can control what gets printed...just try – old_timer Feb 26 '22 at 02:54
  • @phuclv the "not-a-thing thing" you wrote about is the perfect example for the "RAS syndrome" :) or actually goes beyond because it contradicts itself now – CherryDT Feb 26 '22 at 18:26

4 Answers4

6

Your computer doesn't reboot or power cycle every time you run a new program. Every bit of storage in memory or registers your program can use has a value left there by some previous instruction, either in this program or in the OS before it started this program.

If that was the case, e.g. for a microcontroller, yes, each bit of storage might settle into a 0 or 1 state during the voltage fluctuations of powering on, except in storage engineered to power up in a certain state. (DRAM is more likely to be 0 on power-up, because its capacitors will have discharged). But you'd also expect there to be internal CPU logic that does some zeroing or setting of things to guaranteed state before fetching and executing the first instruction of code from the reset vector (a memory address); system designers normally arrange for there to be ROM at that physical address, not RAM, so they can put non-random bytes of machine-code there. Code that executes at that address should probably assume random values for all registers.

But you're writing a simple user-space program that runs under an OS, not the firmware for a microcontroller, embedded system, or mainstream motherboard, so power-up randomness is long in the past by the time anything loads your program.


Modern OSes zero registers on process startup, and zero memory pages allocated to user-space (including your stack space), to avoid information leaks of kernel data and data from other processes. So the values must come from something that happened earlier inside your process, probably from dynamic linker code that ran before main and used some stack space.

Reading the value of a local variable that's never been initialized or assigned is not actually undefined behaviour (in this case because it couldn't have been declared register int ghosts[3], that's an error (Godbolt) because ghosts[i] effectively uses the address) See (Why) is using an uninitialized variable undefined behavior? In this case, all the C standard has to say is that the value is indeterminate. So it does come down to implementation details, as you expected.

When you compile without optimization, compilers don't even notice the UB because they don't track usage across C statements. (This means everything is treated somewhat like volatile, only loading values into registers as needed for a statement, then storing again.)

In the example Godbolt link I added to your question, notice that -Wall doesn't produce any warnings at -O0, and just reads from the stack memory it chose for the array without ever writing it. So your code is observing whatever stale value was in memory when the function started. (But as I said, that must have been written earlier inside this program, by C startup code or dynamic linking.)

With gcc -O2 -Wall, we get the warning we'd expect: warning: 'ghosts' is used uninitialized [-Wuninitialized], but it does still read from stack space without writing it.

Sometimes GCC will invent a 0 instead of reading uninitialized stack space, but it happens not in this case. There's zero guarantee about how it compiles the compiler sees the use-uninitialized "bug" and can invent any value it wants, e.g. reading some register it never wrote instead of that memory. e.g. since you're calling printf, GCC could have just left ESI uninitialized between printf calls, since that's where ghost[i] is passed as the 2nd arg in the x86-64 System V calling convention.


Most modern CPUs including x86 don't have any "trap representations" that would make an add instruction fault, and even if it did the C standard doesn't guarantee that the indeterminate value isn't a trap representation. But IA-64 did have a Not A Thing register result from bad speculative loads, which would trap if you tried to read it. See comments on the trap representation Q&A - Raymond Chen's article: Uninitialized garbage on ia64 can be deadly.

The ISO C rule about it being UB to read uninitialized variables that were candidates for register might be aimed at this, but with optimization enabled you could plausibly still run into this anyway if the taking of the address happens later, unless the compiler takes steps to avoid it. But ISO C defect report N1208 proposes saying that an indeterminate value can be "a value that behaves as if it were a trap representation" even for types that have no trap representations. So it seems that part of the standard doesn't fully cover ISAs like IA-64, the way real compilers can work.

Another case that's not exactly a "trap representation": note that only some object-representations (bit patterns) are valid for _Bool in mainstream ABIs, and violating that can crash your program: Does the C++ standard allow for an uninitialized bool to crash a program?

That's a C++ question, but I verified that GCC will return garbage without booleanizing it to 0/1 if you write _Bool b[2] ; return b[0]; https://godbolt.org/z/jMr98547o. I think ISO C only requires that an uninitialized object has some object-representation (bit-pattern), not that it's a valid one for this object (otherwise that would be a compiler bug). For most integer types, every bit-pattern is valid and represents an integer value. Besides reading uninitialized memory, you can cause the same problem using (unsigned char*) or memcpy to write a bad byte into a _Bool.


An uninitialized local doesn't have "a value"

As shown in the following Q&As, when compiling with optimization, multiple reads of the same uninitialized variable can produce different results:

The other parts of this answer are primarily about where a value comes from in un-optimized code, when the compiler doesn't really "notice" the UB.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Does any CPU have an integer "trap" value in that reading such value to a register or performing arithmetic operations on it will cause a fault? – xiver77 Feb 26 '22 at 04:32
  • @xiver77: Yes, some one's complement machines trap on signed-integer arithmetic with `-0` (bit pattern `0xffffffff` for 32-bit). There's also stuff like IA-64 "Not A Thing" where a register result of a speculative load from a bad address (which bails out and sets a status somewhere else, instead of actually faulting on a bad address, IIRC) will trap if you try to read it. See comments on [trap representation](https://stackoverflow.com/posts/comments/18346846) – Peter Cordes Feb 26 '22 at 05:06
  • @xiver77: Also see my update to this answer inspired your question, thanks. Uninitialized _Bool can crash. (Finished for now adding to that section.) – Peter Cordes Feb 26 '22 at 05:28
4

The registers of the memory after they are 'emptied' get random voltages between 0 and 1,

Nothing so mysterious. You are just seeing what was written to those memory locations last time they were used.

When memory is released it is not cleared or emptied. The system just knows that its free and the next time somebody needs memory it just gets handed over, the old contents are still there. Its like buying an old car and looking in the glove compartment, the contents are not mysterious, its just a surprise to find a cigarette lighter and one sock.

Sometimes in a debugging environment freed memory is cleared to some identifiable value so that its easy to recognize that you are dealing with uninitialized memory. For examples 0xccccccccccc or maybe 0xdeadbeefDeadBeef

Maybe a better analogy. You are eating in a self serve restaurant that never cleans its plates, when a customer has finished they put the plates back on the 'free' pile. When you go to serve yourself you pick up the top plate from the free pile. You should clean the plate otherwise you get what was left there by previous customer

pm100
  • 48,078
  • 23
  • 82
  • 145
  • 2
    You're omitting the very important fact that kernels avoid leaking information between privilege domains. i.e. any memory they map into a user-space process, e.g. as its stack space, will be zeroed (or filled with file contents for `mmap`, or the executable's text & data sections). So your analogies don't apply **system**-wide, only within this one process. (Where CRT startup code used the stack before `main`.) But then yes, those analogies work perfectly for reuse of uninitialized stack space, if this isn't the deepest the stack has ever grown. – Peter Cordes Feb 26 '22 at 09:06
  • @PeterCordes I didnt want to go too deep, but good point – pm100 Feb 26 '22 at 17:48
1

I am going to use a platform that is easy to see what is going on. The compilers and platforms work the same way independent of architecture, operating system, etc. There are exceptions of course...

In main am going to call this function:

test();

Which is:

extern void hexstring ( unsigned int );

void test ( void )
{
    unsigned int x[3];
    hexstring(x[0]);
    hexstring(x[1]);
    hexstring(x[2]);
}

hexstring is just a printf("%008X\n",x).

Build it (not using x86, using something that is overall easier to read for this demonstration)

test.c: In function ‘test’:
test.c:7:2: warning: ‘x[0]’ is used uninitialized in this function [-Wuninitialized]
    7 |  hexstring(x[0]);
      |  ^~~~~~~~~~~~~~~
test.c:8:2: warning: ‘x[1]’ is used uninitialized in this function [-Wuninitialized]
    8 |  hexstring(x[1]);
      |  ^~~~~~~~~~~~~~~
test.c:9:2: warning: ‘x[2]’ is used uninitialized in this function [-Wuninitialized]
    9 |  hexstring(x[2]);
      |  ^~~~~~~~~~~~~~~

The disassembly of the compiler output shows

00010134 <test>:
   10134:   e52de004    push    {lr}        ; (str lr, [sp, #-4]!)
   10138:   e24dd014    sub sp, sp, #20
   1013c:   e59d0004    ldr r0, [sp, #4]
   10140:   ebffffdc    bl  100b8 <hexstring>
   10144:   e59d0008    ldr r0, [sp, #8]
   10148:   ebffffda    bl  100b8 <hexstring>
   1014c:   e59d000c    ldr r0, [sp, #12]
   10150:   e28dd014    add sp, sp, #20
   10154:   e49de004    pop {lr}        ; (ldr lr, [sp], #4)
   10158:   eaffffd6    b   100b8 <hexstring>

We can see that the stack area is allocated:

   10138:   e24dd014    sub sp, sp, #20

But then we go right into reading and printing:

   1013c:   e59d0004    ldr r0, [sp, #4]
   10140:   ebffffdc    bl  100b8 <hexstring>

So whatever was on the stack. Stack is just memory with a special hardware pointer.

And we can see the other two items in the array are also read (load) and printed.

So whatever was in that memory at this time is what gets printed. Now the environment I am in likely zeroed the memory (including stack) before we got there:

00000000 
00000000 
00000000 

Now I am optimizing this code to make it easier to read, which adds a few challenges.

So what if we did this:

test2();
test();

In main and:

void test2 ( void )
{
    unsigned int y[3];
    y[0]=1;
    y[1]=2;
    y[2]=3;
}

test2.c: In function ‘test2’:
test2.c:5:15: warning: variable ‘y’ set but not used [-Wunused-but-set-variable]
    5 |  unsigned int y[3];
      |  

and we get:

00000000 
00000000 
00000000 

but we can see why:

00010124 <test>:
   10124:   e52de004    push    {lr}        ; (str lr, [sp, #-4]!)
   10128:   e24dd014    sub sp, sp, #20
   1012c:   e59d0004    ldr r0, [sp, #4]
   10130:   ebffffe0    bl  100b8 <hexstring>
   10134:   e59d0008    ldr r0, [sp, #8]
   10138:   ebffffde    bl  100b8 <hexstring>
   1013c:   e59d000c    ldr r0, [sp, #12]
   10140:   e28dd014    add sp, sp, #20
   10144:   e49de004    pop {lr}        ; (ldr lr, [sp], #4)
   10148:   eaffffda    b   100b8 <hexstring>

0001014c <test2>:
   1014c:   e12fff1e    bx  lr

test didn't change but test2 is dead code as one would expect when optimized, so it did not actually touch the stack. But what if we:

test2.c

void test3 ( unsigned int * );

void test2 ( void )
{
    unsigned int y[3];
    y[0]=1;
    y[1]=2;
    y[2]=3;
    test3(y);
}

test3.c

void test3 ( unsigned int *x )
{
}

Now

0001014c <test2>:
   1014c:   e3a01001    mov r1, #1
   10150:   e3a02002    mov r2, #2
   10154:   e3a03003    mov r3, #3
   10158:   e52de004    push    {lr}        ; (str lr, [sp, #-4]!)
   1015c:   e24dd014    sub sp, sp, #20
   10160:   e28d0004    add r0, sp, #4
   10164:   e98d000e    stmib   sp, {r1, r2, r3}
   10168:   eb000001    bl  10174 <test3>
   1016c:   e28dd014    add sp, sp, #20
   10170:   e49df004    pop {pc}        ; (ldr pc, [sp], #4)

00010174 <test3>:
   10174:   e12fff1e    bx  lr

test2 is actually putting stuff on the stack. Now the calling conventions generally require that the stack pointer is back where it started when you were called, which means function a might move the pointer and read/write some data in that space, call function b move the pointer, read/write some data in that space, and so on. Then when each function returns it does not make sense usually to clean up, you just move the pointer back and return whatever data you wrote to that memory remains.

So if test 2 writes a few things to the stack memory space and then returns then another function is called at the same level as test2. Then the stack pointer is at the same address when test() is called as when test2() was called, in this example. So what happens?

00000001 
00000002 
00000003 

We have managed to control what test() is printing out. Not magic.

Now rewind back to the 1960s and then work forward to the present, particularly 1980s and later.

Memory was not always cleaned up before your program ran. As some folks here are implying if you were doing banking on a spreadsheet then you closed that program and opened this program...back in the day...you would almost expect to see some data from that spreadsheet program, maybe the binary maybe the data, maybe something else, due to the nature of the operating systems use of memory it may be a fragment of the last program you ran, and a fragment of the one before that, and a fragment of a program still running that just did a free(), and so on.

Naturally, once we started to get connected to each other and hackers wanted to take over and send themselves your info or do other bad things, you can see how trivial it would be to write a program to look for passwords or bank accounts or whatever.

So not only do we have protections today to prevent one program sniffing around in another programs space, we generally assume that, today, before our program gets some memory that was used by some other program, it is wiped.

But if you disassemble even a simple hello world printf program you will see that there is a fair amount of bootstrap code that happens before main() is called. As far as the operating system is concerned, all of that code is part of our one program so even if (let's assume) memory were zeroed or cleaned before the OS loads and launches our program. Before main, within our program, we are using the stack memory to do stuff, leaving behind values, that a function like test() will see.

You may find that each time you run the same binary, one compile many runs, that the "random" data is the same. Now you may find that if you add some other shared library call or something to the overall program, then maybe, maybe, that shared library stuff causes extra code pre-main to happen to try to be able to call the shared code, or maybe as the program runs it takes different paths now because of a side effect of a change to the overall binary and now the random values are different but consistent.

There are explanations why the values could be different each time from the same binary as well.

There is no ghost in the machine though. Stack is just memory, not uncommon when a computer boots to wipe that memory once if for no other reason than to set the ecc bits. After that that memory gets reused and reused and reused and reused. And depending on the overall architecture of the operating system. How the compiler builds your application and shared libraries. And other factors. What happens to be in memory where the stack pointer is pointing when your program runs and you read before you write (as a rule never read before you write, and good that compilers are now throwing warnings) is not necessarily random and the specific list of events that happened to get to that point, were not just random but controlled, are not values that you as the programmer may have predicted. Particularly if you do this at the main() level as you have. But be it main or seventeen levels of nested function calls, it is still just some memory that may or may not contain some stuff from before you got there. Even if the bootloader zeros memory, that is still a written zero that was left behind from some other program that came before you.

There are no doubt compilers that have features that relate to the stack that may do more work like zero at the end of the call or zero up front or whatever for security or some other reason someone thought of.

I would assume today that when an operating system like Windows or Linux or macOS runs your program it is not giving you access to some stale memory values from some other program that came before (spreadsheet with my banking information, email, passwords, etc). But you can trivially write a program to try (just malloc() and print or do the same thing you did but bigger to look at the stack). I also assume that program A does not have a way to get into program B's memory that is running concurrently. At least not at the application level. Without hacking (malloc() and print is not hacking in my use of the term).

halfer
  • 19,824
  • 17
  • 99
  • 186
old_timer
  • 69,149
  • 8
  • 89
  • 168
  • Its not a CPU thing it is a memory thing. Compiler combined with operating system design. The CPU is just the machine that is told what to do. Stacks have been used for a very very long time, so it is not a modern system thing, it was worse before (as to what you did/could see) it keeps getting "better". – old_timer Feb 26 '22 at 17:49
0

The array ghosts is uninitialized, and because it was declared inside of a function and is not static (formally, it has automatic storage duration), its values are indeterminate.

This means that you could read any value, and there's no guarantee of any particular value.

dbush
  • 205,898
  • 23
  • 218
  • 273