17

When a process requests for memory and an operating system is giving some new pages to the process, the kernel should initialize the pages (with zeros for instance) in order to avoid showing potentially confident data that another process used. The same when a process is starting and receives some memory, for example the stack segment.

When I execute the following code in Linux, the result is that the majority of allocated memory is indeed 0, but something about 3-4 kB at the bottom of the stack (the last elements of the array, the highest addresses) contains random numbers.

#include <cstdlib>
#include <iostream>
using namespace std;

int main()
{
    int * a = (int*)alloca(sizeof(int)*2000000);
    for(int i = 0; i< 2000000; ++i)
        cout << a[i] << endl;
    return 0;
}
  1. Why isn't it set to zero too?
  2. Could it be because it is being reused by the process?
  3. If yes, could it be the initialization code that had used those 3-4 kB of memory earlier?
jschmier
  • 15,458
  • 6
  • 54
  • 72
tichy
  • 217
  • 3
  • 8
  • Great question. If it really is spillover from another process you've just stumbled upon a pretty big security problem. Which is why I think the only thing we can be fairly sure it's *not* spillover. :) – The Dag Mar 10 '11 at 21:45
  • 1
    http://stackoverflow.com/questions/1018853/why-is-alloca-not-considered-good-practice :P – AJG85 Mar 10 '11 at 21:47

5 Answers5

13

The operating system does not guarantee a zero'ed out memory, just that you own it. It will probably give you pages of memory that were used before (or never used before, but non-zero). If an application stores potentially-sensitive data, it is expected to zero it before free()'ing.

It's not set to zero because that would be performing unnecessary work. If you allocate 20 megabytes to store a texture or a few frames of video, why would the OS write zeroes to all that memory just so you can overwrite them as the very next thing you do.

As a general rule, operating systems don't do anything that they don't have to.

edit: to expand a little bit, when you "allocate" a block of memory, all the OS is doing is re-assigning pages of memory (blocks of 4096 bytes, typically) to your process from a pool of un-allocated pages. You can also have shared memory, in which case the OS 'assigns' them to multiple processes. That's all allocation amounts to.

yan
  • 20,644
  • 3
  • 38
  • 48
  • OK, but according to the Orange Book http://en.wikipedia.org/wiki/Trusted_Computer_System_Evaluation_Criteria the OS should never give data that used to belong to another process to the user space. As far as I know Microsoft claims that Windows is obeying this rule. I thought Linux does too. Is this wrong? – tichy Mar 10 '11 at 22:00
  • This answer is correct. stdlib memory allocation functions do not initialize/zero memory due to performance issues. You can use memset for this purpose. – dialer Mar 10 '11 at 22:02
  • 1
    The operating system most definitely guarantees that you will not see leftover data from another process in your address space. – Erik Mar 10 '11 at 22:06
  • 1
    Linux, for example doesn't give process any pages with data for stack and mmaps, as it does map all this pages to zero pages with write disable set. If application read such page, it gets zeroes, if app writes this page, the page fault happens and page is remapped to actual physical page. I think it is cleared (as before we can read this page from virtual space and it was `0`filled, so it must stay `0`filled). – osgx Mar 10 '11 at 22:22
  • @dialer This is not about the stdlib, but the OS. Never, ever should you get memory allocated that contains data from a previous program using it! – Bo Persson Mar 10 '11 at 22:24
5

When you get new memory into your process through brk(), sbrk() or mmap() then it is guaranteed to be zeroed out.

But the process stack is already allocated to your process. The alloca() function does not get new stack space, it just returns the current stack pointer and moves the pointer to the end of the new block.

So the memory block returned by alloca() has been previously used by your process. Even if you don't have functions before your alloca() in main, the C libraries and dynamic loader have been using the stack.

Zan Lynx
  • 53,022
  • 10
  • 79
  • 131
  • Could the C libraries and dynamic loader use as much as 4 kB before entering main? – tichy Mar 10 '11 at 22:05
  • @tichy: They might. I'd have to look. It's possible that the dynamic loader and symbol resolver do it. It's got to do hash table stuff, follow pointer chains and write updated GOT and PLT entries. I could see it using 4K. Check this article out: http://www.symantec.com/connect/articles/dynamic-linking-linux-and-windows-part-one – Zan Lynx Mar 10 '11 at 23:08
4

There is nothing in the alloca documentation that says the memory is initialized, so you're just getting whatever garbage was sitting there.

If you want memory to be initialized to zeros you can do the obvious: allocate and manually initialize it with memset. Or you use can calloc that guarantees the memory is initialized to zero.

R. Martinho Fernandes
  • 228,013
  • 71
  • 433
  • 510
4

I am pretty sure that when the OS starts your process, the stack is only zeros. What you observe is another phenomenon, I think. You seem to have compiled your program as C++. C++ does a lot of code (constructors and stuff like that) before your main starts. So what you see are the left over values of your own execution.

If you'd compile your code as C (change to "stdio.h" etc) you'd probably see a much reduced "pollution" if not even none at all. In particular if you'd link your program statically to a minimalist version of a C library.

Jens Gustedt
  • 76,821
  • 6
  • 102
  • 177
  • No. The main is not at top of stack, and lower are libc internals, `ld.so` and various kernel-to-userspace data like argv, envp, auxv. – osgx Mar 10 '11 at 22:25
  • Changing the code to C and statically linking it reduced the non-zeroed stack size to something about 540 B so I guess you are right. Thanks for all the answers. – tichy Mar 10 '11 at 22:50
3

The top of the stack contains the environment variable definitions and below them are the command line arguments and the environ and argv arrays.

On an x86_64 a simple startup code under Linux could look like:

asm(
"       .text\n"
"       .align  16\n"
"       .globl  _start\n"
"       .type   _start,@function\n"
"_start:\n"
"       xor     %rbp, %rbp\n"           // Clear the link register.
"       mov     (%rsp), %rdi\n"         // Get argc...
"       lea     8(%rsp), %rsi\n"        // ... and argv ...
"       mov     %rax, %rbx\n"           // ... copy argc ...
"       inc     %rbx\n"                 // ... argc + 1 ...
"       lea     (%rsi, %rbx, 8), %rdx\n"// ... and compute environ.
"       andq    $~15, %rsp\n"           // Align the stack on a 16 byte boundry.
"       call    _estart\n"              // Let's go!
"       jmp     .\n"                    // Never gets here.
"       .size   _start, .-_start\n"     
);

Edit:

I completely misread the question. The stuff at the top of the stack in your code is probably the result of the startup code called before main() is entered.

Richard Pennington
  • 19,673
  • 4
  • 43
  • 72
  • But with his code.. isn't just the pointer itself on the stack? The allocated memory ought to be on the heap does it not? If not, that uses an AWFUL lot of stack! – The Dag Mar 10 '11 at 21:50
  • If you are writing about my code, you're right it's awful, but its purpose is only to make an experiment. – tichy Mar 10 '11 at 22:03
  • So could it be as much as 4 kB of the stack used before entering main? – tichy Mar 10 '11 at 22:13
  • That does sound like a lot, but don't forget that, because you're writing in C++, any static constructors will also be executed before main. Also, since you're probably using dynamic libraries, that run-time stuff is also being done. – Richard Pennington Mar 10 '11 at 22:24
  • @tichy I referred to the OP's code, and it's not awful - I was attempting to point out something funny as I thought 8 megabytes would be a *lot* on the stack. My real point was to ask if it's not correct that only the pointer itself (a word) is allocated on the stack (with the OPs code) and the huge block allocated on the heap. I've never been a C or ASM or systems guy and I've been in the cosy confines of .NET since 2001, so I'm not sure. :) – The Dag Mar 10 '11 at 23:17
  • @Dag alloca is usually allocated on the stack so the storage automagically gets deallocated when the function returns. Depends on the architecture, of course. – Richard Pennington Mar 10 '11 at 23:20