1

Having recently switched to c, I've been told a thousand ways to Sunday that referencing a value that hasn't been initialized isn't good practice, and leads to unexpected behavior. Specifically, (because my previous language initializes integers as 0) I was told that integers might not be equal to zero when uninitialized. So I decided to put that to the test.

I wrote the following piece of code to test this claim:

#include <stdlib.h>
#include <stdio.h>
#include <stdbool.h>
#include <assert.h>

int main(){
    size_t counter = 0;
    size_t testnum = 2000; //The number of ints to allocate and test.
    for(int i = 0; i < testnum; i++){
        int* temp = malloc(sizeof(int));
        assert(temp != NULL); //Just in case there's no space.
        if(*temp == 0) counter++;
    }
    printf(" %d",counter);
    return 0;
}

I compiled it like so (in case it matters):

gcc -std=c99 -pedantic name-of-file.c

Based on what my instructors had said, I expected temp to point to a random integer, and that the counter would not be incremented very often. However, my results blow this assumption out of the water:

testnum:  ||  code returns:
2             2
20            20
200           200
2000          2000
20000         20000
200000        200000
2000000       2000000
...           ...

The results go on for a couple more powers of 10 (*2), but you get the point.

I then tested a similar version of the above code, but I initialized an integer array, set every even index to plus 1 of its previous value (which was uninitialized), freed the array, and then performed the code above, testing the same amount of integers as the size of the array (i.e. testnum). These results are much more interesting:

testnum:  ||  code returns:
2             2
20            20
200           175
2000          1750
20000         17500
200000        200000
2000000       2000000
...           ...

Based on this, it's reasonable to conclude that c reuses freed memory (obviously), and sets some of those new integer pointers to point to addresses which contain the previously incremented integers. My question is why all of my integer pointers in the first test consistently point to 0. Shouldn't they point to whatever empty spaces on the heap that my computer has offered the program, which could (and should, at some point) contain non-zero values?

In other words, why does it seem like all of the new heap space that my c program has access to has been wiped to all 0s?

Lavaman65
  • 863
  • 1
  • 12
  • 22
  • 7
    why testing undefined behaviour? it depends on the OS/compiler/whatever. Would you bet your life on that? – Jean-François Fabre Aug 01 '17 at 21:14
  • @Jean-FrançoisFabre I wanted to see if initializing a bunch of integers would really have them contain unpredictable values as I was told. I said that in the first paragraph of the question, though. – Lavaman65 Aug 01 '17 at 21:15
  • 5
    It's an implementation detail specific to your environment. That's it. You haven't been lied to and you haven't made a ground breaking discovery. – Sean Bright Aug 01 '17 at 21:16
  • no that's not a good way to generate random values, no. Try that on another OS or compiler you may get a different result. – Jean-François Fabre Aug 01 '17 at 21:16
  • 1
    @UnknowableIneffible: They will generally contain *unpredictable* values (which are not guaranteed to be stable on top of that). But "random"? "Random" is not the right term to use in this case. – AnT stands with Russia Aug 01 '17 at 21:17
  • @AnT Fair enough. I'll edit the comment. – Lavaman65 Aug 01 '17 at 21:17
  • You will often get different results when you compile with debug flags turned on vs. off, and optimization turned on vs. off. – bruceg Aug 01 '17 at 21:18
  • 4
    *not leading to unspecified behavior as expected* - Isn't it ringing an alarm? How can you *expect* some *specific* *unspecified* behavior? – Eugene Sh. Aug 01 '17 at 21:19
  • @EugeneSh. That's fair. I was just under the impression that, at some point, these integer pointers might point to a value that isn't 0, when it doesn't seem to be happening with what I am doing. – Lavaman65 Aug 01 '17 at 21:20
  • @UnknowableIneffible, and at some point (after you've initialized and `free`'d) they are pointing to non-zero values. – Sean Bright Aug 01 '17 at 21:21
  • @SeanBright Right, but that doesn't seem to happen with new heap space that I hadn't freed before, which is what I was curious about. – Lavaman65 Aug 01 '17 at 21:22
  • 1
    @UnknowableIneffible, the answer to your question relies on your compiler and OS, so that is where you should start your search. – Sean Bright Aug 01 '17 at 21:24
  • @SeanBright Thanks. I'll take a look, and if I find anything interesting I'll post it as an answer. – Lavaman65 Aug 01 '17 at 21:25
  • "I've been told a thousand ways to Sunday that referencing a value that hasn't been initialized isn't safe" --> referencing uninitialized `unsigned char` _is_ safe. The value read may be consistent, it may vary, but it is _safe_ - not a trap. – chux - Reinstate Monica Aug 01 '17 at 21:26
  • Just because `malloc` is not *required* to zero the data, does not mean it *won't*, and some implementations do it for security reasons, so that you can't "peek" some previous data from elsewhere. – Weather Vane Aug 01 '17 at 21:27
  • @chux That's true. I'll fix the post. – Lavaman65 Aug 01 '17 at 21:27
  • 2
    "My question is why all of my integer pointers in the first test consistently point to 0. " --> Try storing random data in the allocation and _then_ free it. Try allocating random sizes. The test is not robust. A key factor is that code does not write anything to the allocated memory. Many OS simple map allocated memory to a "zero" page and keep it mapped there until something _interesting_ is written. See [Why is malloc not “using up” the memory on my computer?](https://stackoverflow.com/q/19991623/2410359) – chux - Reinstate Monica Aug 01 '17 at 21:31
  • To see what happens, you need to examine the machine code produced by the compiler, or run the program under debugger in disassembly mode, machine code instruction by instruction. – hyde Aug 01 '17 at 21:35
  • Answered by [What is undefined behaviour?](https://stackoverflow.com/a/4105123/1505939) – M.M Aug 01 '17 at 21:51
  • The program leaks memory like mad...doesn't it crash for sufficiently large values of testnum? – Brad S. Aug 01 '17 at 22:08
  • @BradS. Yeah, but that was sort of the point. I wanted pointer values I hadn't seen before, not ones that I had previously freed. Not that I wanted to leak a bunch of memory, but it was the only way I could think to do it. – Lavaman65 Aug 01 '17 at 22:18

4 Answers4

5

As you already know, you are invoking undefined behavior, so all bets are off. To explain the particular results you are observing ("why is uninitialized memory that I haven't written to all zeros?"), you first have to understand how malloc works.

First of all, malloc does not just directly ask the system for a page whenever you call it. It has an internal "cache" from which it can hand you memory. Let's say you call malloc(16) twice. The first time you call malloc(16), it will scan the cache, see that it's empty, and request a fresh page (4KB on most systems) from the OS. It then splits this page into two chunks, gives you the smaller chunk, and saves the other chunk in its cache. The second time you call malloc(16), it will see that it has a large enough chunk in its cache, and allocate memory by splitting that chunk again.

freeing memory simply returns it to the cache. There, it may (or may not be) be merged with other chunks to form a bigger chunk, and is then used for other allocations. Depending on the details of your allocator, it may also choose to return free pages to the OS if possible.

Now the second piece of the puzzle -- any fresh pages you obtain from the OS are filled with 0s. Why? Imagine it simply handed you an unused page that was previously used by some other process that has now terminated. Now you have a security problem, because by scanning that "uninitialized memory", your process could potentially find sensitive data such as passwords and private keys that were used by the previous process. Note that there is no guarantee by the C language that this happens (it may be guaranteed by the OS, but the C specification doesn't care). It's possible that the OS filled the page with random data, or didn't clear it at all (especially common on embedded devices).

Now you should be able to explain the behavior you're observing. The first time, you are obtaining fresh pages from the OS, so they are empty (again, this is an implementation detail of your OS, not the C language). However, if you malloc, free, then malloc again, there is a chance that you are getting back the same memory that was in the cache. This cached memory is not wiped, since the only process that could have written to it was your own. Hence, you just get whatever data was previously there.

Note: this explains the behavior for your particular malloc implementation. It doesn't generalize to all malloc implementations.

Andrew Sun
  • 4,101
  • 6
  • 36
  • 53
0

First off, you need to understand, that C is a language that is described in a standard and implemented by several compilers (gcc, clang, icc, ...). In several cases, the standard mentions that certain expressions or operations result in undefined behavior.

What is important to understand is that this means you have no guarantees on what the behavior will be. In fact any compiler/implementation is basically free to do whatever it wants!

In your example, this means you cannot make any assumptions of when the uninitialized memory will contain. So assuming it will be random or contain elements of a previously freed object are just as wrong as assuming that it is zero, because any of that could happen at any time.

Many compilers (or OS's) will consistently do the same thing (such as the 0s you observer), but that is also not guaranteed.

(To maybe see different behaviors, try using a different compiler or different flags.)

Elmar Peise
  • 14,014
  • 3
  • 21
  • 40
  • See my comment above. He is not reusing any memory, he is leaking memory. I do not know any modern system which exposes unzeroed memory to the started program. for security reason (memory if not zeroed may contain sensitive information like passwords or credit cards numbers left by previous programs / users – 0___________ Aug 01 '17 at 21:53
  • compilers do not clear any memories. it can be done only by the program startup routines. – 0___________ Aug 01 '17 at 22:17
  • In his second example, where he allocs, inits, frees an array, he could well be reusing those memory locations. Compilers are unlikely to clear memory, but is it guaranteed that they don't? I mean, they could also fill it with 0xDEADBEEF, couldn't they? – Elmar Peise Aug 01 '17 at 23:05
  • Compilers do not. It is guaranteed. It is done by the startup code linked to the compiled files. You can write your own startup (which I do quite often in my embedded bare metal projects). Modern OSes always clear all memory accessed by the program before passing execution to it - for security reasons – 0___________ Aug 01 '17 at 23:10
0

Undefined behavior does not mean "random behavior" nor does it mean "the program will crash." Undefined behavior means "the compiler is allowed to assume that this never happens," and "if this does happen, the program could do anything." Anything includes doing something boring and predictable.

Also, the implementation is allowed to define any instance of undefined behavior. For instance, ISO C never mentions the header unistd.h, so #include <unistd.h> has undefined behavior, but on an implementation conforming to POSIX, it has well-defined and documented behavior.

The program you wrote is probably observing uninitialized malloced memory to be zero because, nowadays, the system primitives for allocating memory (sbrk and mmap on Unix, VirtualAlloc on Windows) always zero out the memory before returning it. That's documented behavior for the primitives, but it is not documented behavior for malloc, so you can only rely on it if you call the primitives directly. (Note that only the malloc implementation is allowed to call sbrk.)

A better demonstration is something like this:

#include <stdio.h>
#include <stdlib.h>
int
main(void)
{
    {
        int *x = malloc(sizeof(int));
        *x = 0xDEADBEEF;
        free(x);
    }
    {
        int *y = malloc(sizeof(int));
        printf("%08X\n", *y);
    }
    return 0;
}

which has pretty good odds of printing "DEADBEEF" (but is allowed to print 00000000, or 5E5E5E5E, or make demons fly out of your nose).

Another better demonstration would be any program that makes a control-flow decision based on the value of an uninitialized variable, e.g.

int foo(int x)
{
    int y;
    if (y == 5)
        return x;
    return 0;
}

Current versions of gcc and clang will generate code that always returns 0, but the current version of ICC will generate code that returns either 0 or the value of x, depending on whether register EDX is equal to 5 when the function is called. Both possibilities are correct, and so generating code that always returns x, and so is generating code that makes demons fly out of your nose.

zwol
  • 135,547
  • 38
  • 252
  • 361
0

useless deliberations, wrong assumptions, wrong test. In your test every time you malloc sizeof int of the fresh memory. To see the that UB you wanted to see you should put something in that allocated memory and then free it. Otherwise you do not reuse it, you just leak it. Most of the OS-es clear all the memory allocated to the program before executing it for the security reasons (so when you start the program everything was zeroed or initialised to the static values).

Change your program to:

int main(){
    size_t counter = 0;
    size_t testnum = 2000; //The number of ints to allocate and test.
    for(int i = 0; i < testnum; i++){
        int* temp = malloc(sizeof(int));
        assert(temp != NULL); //Just in case there's no space.
        if(*temp == 0) counter++;
        *temp = rand();
        free(temp);
    }
    printf(" %d",counter);
    return 0;
}
0___________
  • 60,014
  • 4
  • 34
  • 74