1

In linux, is calloc exactly the same as malloc + memset or does this depend on the exact linux/kernel version?

I am particularly interested in the question of whether you can calloc more RAM than you physically have (as you can certainly malloc more RAM than you physically have, you just can't write to it). In other words, does calloc always actually write to the memory you have been allocated as the specs suggest it should.

Simd
  • 19,447
  • 42
  • 136
  • 271
  • 5
    `calloc` and `malloc` are not kernel operations, they're just C library functions. – Barmar Nov 03 '13 at 09:58
  • @Barmar True but where is the "optimistic memory allocation strategy" implemented? In the library or kernel? – Simd Nov 03 '13 at 09:59
  • possible duplicate of [Why malloc+memset is slower than calloc?](http://stackoverflow.com/questions/2688466/why-mallocmemset-is-slower-than-calloc) – Netherwire Nov 03 '13 at 09:59
  • 1
    @Barmar they not are not just C library functions as they call syscalls. – ouah Nov 03 '13 at 09:59
  • @ouah: traditionally, the system call underlying `malloc()` et al was `sbrk()`, which is not a POSIX standard system call. I don't know whether Linux has actual system calls for the memory management functions, but it seems unlikely since you can use alternative libraries for memory allocation. – Jonathan Leffler Nov 03 '13 at 10:04
  • @JonathanLeffler *glibc* and like use [Doug Lea's malloc.c](http://webcache.googleusercontent.com/search?q=cache:8aJdhhEBFsEJ:ftp://g.oswego.edu/pub/misc/malloc.c&hl=en&gl=ca&strip=1) or a derivative. It uses `sbrk()` and `mmap()` for large regions. And `mremap()` for an efficient `realloc()` on Linux. Doug Lea is currently/recently working on Java's GC. – artless noise Nov 04 '13 at 18:26
  • At one point *glibc* malloc, etc was as per Doug Lea. [Here is a more current version](https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/malloc.c;h=897c43a39d963580f701518c9ecc1cc7b9275942;hb=HEAD) of the *glibc* allocator. It is much the same, but is not 100% identical. – artless noise Nov 04 '13 at 18:31

4 Answers4

6

Of course, that depends on the implementation, but on a modern day Linux, you probably can. Easiest way is to try it, but I'm saying this based on the following logic.

You can malloc more than the memory you have (physical + virtual) because the kernel delays allocation of your memory until you actually use it. I believe that's to increase the chances of your program not failing due to memory limits, but that's not the question.

calloc is the same as malloc but zero initializes the memory. When you ask Linux for a page of memory, Linux already zero-initializes it. So if calloc can tell that the memory it asked for was just requested from the kernel, it doesn't actually have to zero initialize it! Since it doesn't, there is no access to that memory and therefore it should be able to request more memory than there actually is.

As mentioned in the comments this answer provides a very good explanation.

Community
  • 1
  • 1
Shahbaz
  • 46,337
  • 19
  • 116
  • 182
  • @tux3, if you were in kernel space, you can do anything. The question is regarding user space where `malloc` and `calloc` functions are available. If you _could_ get non-initialized pages from the kernel in your user application, that would have been a serious security bug. – Shahbaz Jan 09 '15 at 10:14
2

Whether calloc needs to write to the memory depends on whether it got the allocation from heap pages that are already assigned to the process, or it had to request more memory be assigned to the process by the kernel (using a system call such as sbrk() or mmap()). When the kernel assigns new memory to a process, it always zeroes it first (typically using a VM optimization, so it doesn't actually have to write to the page). But if it's reusing memory that was assigned previously, it has to use memset() to zero it.

Barmar
  • 741,623
  • 53
  • 500
  • 612
1

It is not mentioned in the cited duplicate or here. Linux uses virtual memory and can allocate more memory that physically available in the system. A naive implementation of calloc() that simply does a malloc() plus memset() in user space will touch every page.

As Linux typically allocates in 4k chunks, all of the calloc() blocks are the same and initially read as zero. That is the same 4k chunk of memory can be mapped read only and the entire calloc() space in only taking up approximately size/4k * pointer_size + 4k. As the program writes to the calloc() space, a page fault happens and Linux will allocate a new page (4k) and resume the program.

This is called copy-on-write or COW for short. malloc() will generally behave the same way. For small sizes, the 'C' library will use binning and share 4k pages with other small sized allocation.

So, there are typically two layers involved.

  1. Linux kernel's process memory management.
  2. glibc heap management.

If the memory size requested is large and requires new memory allocated to the process, then most of the above applies (via Linux's process memory management). However, if the memory requested is small, then it will be like a malloc() plus memset(). In the large allocation size, the memset() is damaging as it touches the memory and the kernel thinks it needs a new page to allocate.

Community
  • 1
  • 1
artless noise
  • 21,212
  • 6
  • 68
  • 105
  • `calloc()` can be advantageous to allocate large sparse arrays. Especially if the *occupied* chunks are clustered, like Unicode pages, etc. Only 4k pages where *non-zero* data is used need to be physically allocated. – artless noise Nov 04 '13 at 18:20
  • The 4k number is `malloc_getpagesize()` in dlmalloc. You can also use `sysconf(PAGESIZE);`, [`getpagesize()`](http://man7.org/linux/man-pages/man2/getpagesize.2.html), etc. On some systems, this is 8k. – artless noise Nov 04 '13 at 18:35
  • See also: [Linux Zero page (BSS)](http://stackoverflow.com/questions/12115434/linux-will-zeroed-page-pagefault-on-first-read-or-on-first-write), [Ulrich Drepper's Linux VMM](http://lwn.net/Articles/253361/), [Memory FAQ](http://landley.net/writing/memory-faq.txt), [lxr empty_zero_page](http://lxr.free-electrons.com/ident?i=empty_zero_page). – artless noise Nov 04 '13 at 20:29
0

You can't malloc(3) more ram than the kernel gives the process doing the malloc(3)-ing. malloc(3) returns NULL if you can't allocate the amount of memory you want to allocate. In addition, malloc(3) and memset(3) are defined by your c library (libc.so) and not your kernel. The Linux kernel defines mmap(2) and other low-level memory allocation functions, not the *alloc(3) family (excluding kalloc()).

cyphar
  • 2,840
  • 1
  • 15
  • 25