16

I am using Debian squeeze and have noticed that memory is always zeroed. Is this new in linux distributions ? Some time ago, I believe I could use puts() and garbage would be output.

I run this test program many times but the commented results are always the same. (I have randomize_va_space=2 in sysctl.conf so I know that memory in different locations is being used at each run.)


char *a = malloc(50000000);
a[49999999] = '\0';
puts(a); // it outputs nothing since all are zeroes
printf("%p\n", a);
if(a[5000] == '\0') // this condition is always true
{
    puts("It is a nul char.");
}

Is it possible to make the system not zero memory ? What options could this Debian squeeze installation have activated that always zero memory ?

Nuri Dure
  • 161
  • 1
  • 3

6 Answers6

23

On any modern operating system, the only way newly obtained memory will contain nonzero values is if memory previously freed by your program got reused by malloc. When new memory is obtained from the operating system (kernel), it is initially purely virtual. It has no physical existence; instead it is mapped as copy-on-write mappings of a single shared memory page that's full of 0 bytes. The first time you attempt to write to it, the kernel will trap the write, allocate a new page of physical memory, copy the contents of the original page (which in this case are all 0 bytes) to the new page, and then resume your program. If the kernel knows the newly allocated physical memory is already zero-filled, it might even be able to optimize out the copy step.

This procedure is both necessary and efficient. It's necessary because handing over memory that might contain private data from the kernel or another user's processes to your process would be a critical security breach. It's efficient because no zeroing is performed at allocation time; the "zero-filled" pages are just reference to a shared zero page.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • There's a thread in Windows whose job it is to zero out unused physical pages to provide a pool of new pages that can safely be mapped into user space. (By comparison the kernel is allowed to allocate unzeroed pages for its own use.) – Neil May 14 '11 at 22:06
  • However, kernel developers must still ensure that the data in their "unzeroed" pages of memory are not leaked to any user-mode processes. Furthermore, given that the memory is zeroed in the background, there is minimal impact on the system, unless there is significant memory churn. But churning through memory is likely a performance problem regardless of any zeroing. – Brian May 14 '11 at 22:43
  • Can you rely on this behavior and is it portable? That new program pages are always "zero-filled"? – user129393192 Jun 13 '23 at 04:39
  • @user129393192: If you're obtaining them yourself via `mmap`, yes, absolutely. The contents are specified to be zero. Historically, the mechanism of this even had clear zeroing semantics: you had to `mmap` a file, and the file was `/dev/zero`, the "zero device" that produces an unlimited number of zeros. The modern replacement is `MAP_ANON`/`MAP_ANONYMOUS` which is specified to do the same without the need for ability to open a device file. – R.. GitHub STOP HELPING ICE Jun 13 '23 at 13:13
  • If you're using `malloc`, on the other hand? Of course not. You have no guarantee that the memory is "newly-obtained", or that the implementation has not stored malloc bookkeeping information, trap patterns to catch application bugs, or whatever in the memory. You have to treat it as uninitialized. You can use `calloc` if you want zero-filled, though, and it will typically take advantage of new memory being "naturally zero filled" if possible. – R.. GitHub STOP HELPING ICE Jun 13 '23 at 13:14
7

From what I read in Linux Kernel Development, the kernel does zero pages because it may contain kernel data that a user program could interpret and some way gain access to the system.

malloc asks the kernel for more pages, so the kernel is responsible for that memory that you are receiving.

Iustin
  • 1,220
  • 1
  • 12
  • 17
  • According to this WP page on the brk/sbrk functions: http://en.wikipedia.org/wiki/Sbrk you are right. But this seems like a very wasteful thing for the kernel to do. –  May 14 '11 at 21:29
  • 2
    Why? It seems like a clever thing for a program to do. If you have very stupid program that holds stupid data unencrypted and then just dies without free()'ing it, you could potentially write a program to take advantage of that. I'm pretty sure you can disable when you compile a kernel though. – Dhaivat Pandya May 14 '11 at 23:12
  • "Disable it"? There's definitely no way to make a kernel leak data to userspace via normal options; you'd have to intentionally break it to do that. Due to the fact that new pages are COW references to the zero page, there is no "default case" that would leak. – R.. GitHub STOP HELPING ICE May 15 '11 at 03:43
  • 1
    You can disable it (usually only done for embedded systems where only your software us running.) Zeroing the memory is absolutely the right thing for the kernel to do on multi-user systems. – Eloff Feb 04 '14 at 23:28
4

The first time you malloc a chunk memory there's a fair chance it will be zero because memory allocated by a system call (sbrk, mmap) is zeroed by the kernel. But if you free and malloc again the memory is recycled and may not contain zero.

augustss
  • 22,884
  • 5
  • 56
  • 93
2

You'll find that memory is zerored on most operating systems that have isolation between processes. The reason is that a process must not be allowed to peek at the memory released by another process, so a memory page must be erased between the time it's freed by some process and the time when it's released by another process. In practice, erased means zeroed, and the memory is usually zeroed at the time it's allocated by the process.

When you call malloc in your toy program, the memory hasn't been used for anything else yet. So it's still fresh from the kernel, full of zeros. If you try in a real program that's already allocated and freed a lot of heap blocks, you'll find that memory that's already been used by your process still contains whatever garbage you (or the memory management system) may have put there.

Gilles 'SO- stop being evil'
  • 104,111
  • 38
  • 209
  • 254
2

As already illustrated, the key difference is first time allocation vs. allocation. If you try:

char *a, tst;
do {
    a = malloc(50000000);
    a[49999999] = '\0';
    printf("%50s\n%p", a, a); // it outputs nothing 1st, but bbbb.... 2nd
    tst = a[5000]
    memset(a, 'b', 50000000);
    free(a);
} while (tst == '\0');

it'll print you two lines (most likely, at least if the pointers are the same).

Key is that the memory block returned by malloc() has undefined contents. It may or may not be zeroes, and depends on how memory allocation has been done in the past by the program (or what memory debugging facilities are used).

If you want to guarantee contents, you need calloc() or explicit initialization after allocation.

The system's integrity / data separation guarantee on the other hand means that any initial address space requested by the system - whether via sbrk() or mmap(MAP_ANON) - must be zero-initialized, as any other contents of such would consist of a security breach.

FrankH.
  • 17,675
  • 3
  • 44
  • 63
1

Your code does not test if all memory is zeroed - it tests if two specific bytes are zero - a[0] and a[5000]. Also, malloc() has nothing to do with the kernel - it is a C library function, not a system call. It is highly unlikely that its implementers zero memory - what you are seeing is just some random quirk of your particular configuration.