0

If we have a function in C with a simple unitialized ìnt variable in it, we know that this variable may not be always initialized to zero. Instead, it may contain some "garbage" value.

My question is: what exactly could represent that value? It can be some information left (unfreed memory) by a process that was terminated before?

If yes, then wouldn't be this an extremely major security breach? Because in that way any process can read information left by processes that used the same address space as the current process (passwords, tokens, etc.).

My assumption is that for each new process, the kernel zeroes the memory allocated for that new process (at least for the stack) and then it loads the executable into memory. Those "garbage" values are actually values generated by the loading procedure of the current process (so that there is no way to access any left data from other processes that used the same address space).

I'm arguing with some fellows on this topic and I really want a clear and comprehensive answer to this (I'm sure there is one). We are assuming that the kernel is debian/centos based. It would be great to know if there are differences in behaviour for different kernels / OS-es.

Thank you respectfully.

Cosmin Ioniță
  • 3,598
  • 4
  • 23
  • 48
  • 3
    Such variables will have an *indeterminate* value. – Some programmer dude Oct 02 '18 at 21:52
  • Yes, but it is *possible* to access data left by other process? – Cosmin Ioniță Oct 02 '18 at 21:53
  • 1
    Maybe, or maybe not. That's kind of the point with indeterminate values, you just can't tell what they are or what they come from. Also, for other types that plain integers, the value might be a *trap representation* and even reading those could lead to crashes. – Some programmer dude Oct 02 '18 at 21:55
  • So in other words, the kernel does not **enforce** the zeroing procedure of the memory allocated for a new process? I know that there are performance concerns regarding this issue, but it seems a very naive approach to me, from a security standpoint – Cosmin Ioniță Oct 02 '18 at 21:58
  • 1
    Modern operating systems have background tasks to zero memory after it's released by a process. So no, that *indeterminate* value is not a value from another process. – user3386109 Oct 02 '18 at 21:59
  • If that would be true, then the value of the local variable would always be zero, but it seems that it's not always zero, even on modern operating systems – Cosmin Ioniță Oct 02 '18 at 22:02
  • 1
    The non-zero values on the stack are left over from the startup code. But those values are from the current process, not from other processes. – user3386109 Oct 02 '18 at 22:04
  • Oh, you are just saying that my assumption is correct? – Cosmin Ioniță Oct 02 '18 at 22:05
  • Zeroing only the stack leaving other memory ares not zeroed is pointless. – 0___________ Oct 02 '18 at 22:06
  • Yup, although the details are slightly different, your general idea is correct. For more about the details search `virtual memory zero page` here on SO. – user3386109 Oct 02 '18 at 22:07
  • @CosminIoniță Linux will not ever leave any remains of memory from another process in your process -doing so would be a big security hole. The code could simply use a register for what is a variable in your code, and if left uninitialized you're left with whatever was in that register from previous operations. However, using uninitialized variables is undefined behavior, the compiler could just generate junk code if your program is that ill formed - so a comprehensive answer is not that straight forward. – nos Oct 02 '18 at 22:16
  • 3
    Please make a distinction between the OS and the C standard. The latter says an uninitialised variable is exactly that. The OS however might do other things, but the C code must not try to second-guess that. If the C standard says a variable is uninitialised: then you must treat it as such. – Weather Vane Oct 02 '18 at 22:29
  • What the language defines and what the OS does on loading/terminating are different things. You are right C is not intrinsically secure, but equally it does not saddle all applications with the overhead of unnecessary security - you add just the security you need, or execute in a secure environment (OS). – Clifford Oct 02 '18 at 22:32
  • In C. uninitialized data may contain trap values. To access them can end the program. – chux - Reinstate Monica Oct 02 '18 at 22:33
  • @Someprogrammerdude Hmm, Perhaps its UB, not _indeterminate_ "If the lvalue designates an object ..., and that object is uninitialized ..., the behavior is undefined." C11 §6.3.2.1 2 I think with `unsigned char`, it is indeterminate. – chux - Reinstate Monica Oct 02 '18 at 23:01
  • 2
    Possible duplicate of [What happens to a declared, uninitialized variable in C? Does it have a value?](https://stackoverflow.com/q/1597405/608639) – jww Oct 03 '18 at 21:39
  • jww: This question is not duplicated by the one you mentioned. I'm asking specifically if the garbage values in uninitialized variables can have values left by another process and NOT if there is a value or not, as the mentioned question says. – Cosmin Ioniță Oct 04 '18 at 03:20
  • Near duplicate: [Where do the values of uninitialized variables come from, in practice on real CPUs?](https://stackoverflow.com/q/71268001) which covers the fact that real OSes zero pages to avoid info leaks to user-space. And that reading the same uninitialized local multiple times might give different values! – Peter Cordes Jul 06 '22 at 02:36

2 Answers2

7

This should be separated into two questions:

  • What does the C standard say about the value of an uninitialized object?
  • What is in memory when main is called?

The first question is discussed in other Stack Overflow questions and answers. A full answer is complicated and involves a discussion of a variety of circumstances, and this question does not seem to be asking about that particularly, so I will leave it for the other Stack Overflow questions. For this question, suffice it to say that using the value of an uninitialized object is prone to undefined behavior. Further, this is not simply because the memory of the object might have troublesome values but because the C standard permits a C implementation to treat a program that reads an uninitialized value as a misbehaving program in various ways, and optimizations can then disrupt the program further.

As far as what is in memory is concerned (supposing we have a supported way to examine it, perhaps by using assembly language instead of C), then every multiuser system that provides any sort of security erases (or otherwise initializes) memory before making it available to a process. Any values that are in memory at the time main is called are, as the question contemplates, either the result of the loading process or of initialization by the operating system. (Note that the result of the loading process includes both loading of constant data and program text—so we would expect to find the defined values there—and whatever data is leftover from the work done by the loading code—its variables and so on.)

The question asks for a clear answer, so let me be clear about this: An operating system that provides security for user processes must erase data of previous processes from memory before making that memory available to another process. Security cannot be provided by trusting a program not to examine the memory it is given and doing whatever it wants with it.

Rudimentary systems not intended for sharing by untrusted users can of course skip the initialization of memory when creating new processes and allocating memory for them.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • This is almost exactly the answer I was expecting. Thank you! However, can you say exactly if this behaviour (erasing data of previous processes) can is used by very commonly used linux distributions? Like Ubuntu / Linux Mint / Fedora / CentOS? – Cosmin Ioniță Oct 03 '18 at 08:17
  • @CosminIoniță: I do not have specific citations for specific OS versions or distributions. Let me explain that your question is equivalent to asking an automotive engineer whether cars have door locks. You do not ask the dealer when buying a car whether it has locks, because it is a known standard feature that every new car sold for general-purpose public-street use has locks. (Special cars like race cars might not.) Similarly, every general-purpose multi-user system protects memory. I have been programming 40 years, I worked 13 years in various OS groups at Digital Equipment Corporation,… – Eric Postpischil Oct 03 '18 at 13:07
  • … and 13 years as a senior software engineer at Apple. Erasing one process’ data from memory before assigning it to another process is simply fundamental; there is no question that any general-purpose multi-user system does that. That said, if you really want specific assurance about particular operating systems, you might try asking questions such as “Where in the Linux source code does it erase one process’ data from memory before reassigning it to another process?” I would expect the answer is somewhere in the virtual memory subsystem. – Eric Postpischil Oct 03 '18 at 13:08
  • Outstanding! I'm lucky to be part of a community composed of guys like you! Keep up the good work! – Cosmin Ioniță Oct 03 '18 at 13:13
  • 1
    @CosminIoniță: I did find [this document](https://www.kernel.org/doc/gorman/pdf/understand.pdf) describing Linux virtual memory management. On page 52, it says that a new page is allocated for a process simply by pointing it to a read-only shared (“globally visible”) page filled with zeros. When the process tries to write to the page for the first time, a hardware page fault is triggered, and then the system creates a new (non-shared) page, fills it with zeros, and marks it writable for the process. So, anytime a process allocates a new page, it gets a page of zeros. – Eric Postpischil Oct 03 '18 at 13:17
  • (I expect there would be additional complications, as when a process is not simply allocating new memory but is mapping a file, including loading code from its own executable file. These requests would fill the newly allocated memory with data from the proper file, thus also erasing any data that memory had from a prior process.) – Eric Postpischil Oct 03 '18 at 13:19
1

Well, the local variables store in stack space, so once you finish the call to the current routine, stack pointer moves up to free all the current routine local variables and, for efficiency reasons, no previous contents are erased (only the stack pointer is moved).

If you enter a new routine, what the compiler does, is to move the stack pointer down (it doesn't push anything on the local variables space, just moves over that space to make room for the new set of local variables) and doesn't use that space until a local variable is needed in the code. What you are asking for is how to interpret the bit pattern that the stack segment has from previous use, and that depends on how the stack has been used previously to entering the current routine. This can be:

  • rests of temporary data used to calculate a complex expression.
  • parameter data of a previous call to another routine.
  • return addresses of previous called routines.
  • local variables of a previously called routine, that, as ended, are not in use anymore.
  • any other thing.

As that memory is used now in a different way (as the local space of current routine dictates) there's no valid interpretation of such memory contents, but as trashed data from old code.

Luis Colorado
  • 10,974
  • 1
  • 16
  • 31