1

I am reading this page about memory overcommit, and it mentions

The C language stack growth does an implicit mremap. If you want absolute
guarantees and run close to the edge you MUST mmap your stack for the 
largest size you think you will need. For typical stack usage this does
not matter much but it's a corner case if you really really care

When a program is complied (by gcc for example), the size limit of stack is defined (I remember there is a gcc parameter to adjust it). Then, inside the program, we can keep allocating on the stack.

Few questions:

  1. What does "stack growth" in this context? Does it mean if a C program keeps allocating/deallocating on the stack, sometimes, mremap() will be called behind the scene? And why if the size limit of a stack has been defined at compile time?
  2. How can we mmap the stack?
HCSF
  • 2,387
  • 1
  • 14
  • 40
  • I don't think stack size is a compile-time parameter; it's set at run time as an rlimit. – Nate Eldredge Oct 22 '21 at 05:02
  • there are few posts about it: [here](https://stackoverflow.com/a/18912422/9784373) and [here](https://stackoverflow.com/a/2275586/9784373). I believe one stack size limit is imposed by the compiler and another is by the OS? – HCSF Oct 22 '21 at 05:45
  • One of those posts is for MacOS and the other is for Windows. I don't think they apply to Linux (at least not for x86). – Nate Eldredge Oct 22 '21 at 05:57
  • The whole thing is: if you **know** your program exceeds the stacksize set at compile/load time, you can preset it to a larger value. This can avoid expensive remaps, or even the stack address range colliding with the heap address range. (or other address regions). [Think: RT] – wildplasser Oct 22 '21 at 13:18
  • @wildplasser; Can you, though? Again, on Linux, I'm not sure there's a mechanism for setting the stack size at link/load time. The only way I know is to call `setrlimit(RLIMIT_STACK, ...)` at run time. – Nate Eldredge Oct 22 '21 at 13:56
  • That will be checked at run time. I think the loader/linker sets the size + set up the initial stack segment. [based on what is specified in the objects and in the linker-config] – wildplasser Oct 22 '21 at 14:04
  • 1
    @wildplasser: Maybe I should ask a separate question, because I've never seen how this could actually be done. I looked for an ELF header or note option that would do it, and didn't find one. – Nate Eldredge Oct 22 '21 at 14:06
  • Another place to look would be src/kernel/mm/ , or the loader itself – wildplasser Oct 22 '21 at 14:08

1 Answers1

3

The "magic" here is the behaviour of MAP_GROWSDOWN flag (implemented by the Linux kernel) when a process requests new memory from the kernel via mmap() system call, and that it is often used for the initial stack (the stack for the first thread in a process, when it is first executed).

So, while new processes do typically get a MAP_GROWSDOWN stack by default, a process can manage its own stack as well. If the process creates new threads, it has to create stacks for them. (Currently, pthread_create() creates a fixed-size stack (of default maximum size, or sized as directed in the pthread_attr_t attribute block if specified), not a MAP_GROWSDOWN stack.)

The way the Linux kernel implements a MAP_GROWSDOWN memory mapping is that the actual memory is preceded by an extra page, called a "guard page". (On x86-64, pages are aligned units of 4096 bytes, but other page sizes exist; at run time, use sysconf(_SC_PAGESIZE) to obtain the size in bytes.)

Whenever the guard page is first accessed, the kernel converts it to a standard page (same as the other pages in that same mapping), and creates a new guard page just below (at the next smaller page address). If there is something already mapped at those virtual addresses, the mapping is not changed, and the process will receive a SIGSEGV (segment violation error). Thus, only the amount of available address space (and indirectly, available memory) limits the growth of such stacks.

This also means that using local arrays greater than a page size can lead to a SIGSEGV, if relying on MAP_GROWSDOWN automatic stack growth. It is therefore much more reliable to use dynamic memory management in C –– malloc()/realloc()/free(), and interfaces like getline() and asprintf() –– than rely on large on-stack fixed-size arrays.

Essentially, as long as the stack elements are at most a page in size, such stacks will automatically grow as needed.

The "implicit remap" thus only applies to the initial thread, because it uses a stack that uses the MAP_GROWSDOWN flag; and the implicit remap itself refers to this auto-growsdown facility in page-sized units.

If your process does many separate mmap() calls for different kinds of allocations, say maps files to memory or such, it is possible that they will be located such that the growth of the MAP_GROWSDOWN mappings is limited to less than what the process expects. (The addresses given by the kernel are at least somewhat randomized, for security purposes.)

The suggestion for remapping the kernel for the largest size one might need, means that one can –– I'm not sure I agree with "MUST" ––, near the beginning of their program, use mremap() to convert the MAP_GROWSDOWN mapping to a larger, fixed-size mapping; typically, to the size reported by getrlimit(RLIMIT_STACK,). Because this essentially allocates the address space, but does not populate the pages yet with actual RAM until first accessed, the main cost is the kernel metadata (page tables and such).

It is possible that the C runtime provided by your compiler already does this (to the size reported by getrlimit(RLIMIT_STACK, )) as part of setting up the runtime environment for C (in crt*.o or libgcc*, for example). I haven't checked.

If one wants to, for example when creating a new thread, one can use mmap() (say, mmap((void *)0, size_in_bytes, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK | MAP_GROWSDOWN, -1, 0)) to allocate whatever stack one wants, then use pthread_attr_init() to initialize a thread attribute set, put the address and size of the stack into that thread attribute using pthread_attr_setstack(), and supply a pointer to that thread attribute set as the second parameter to pthread_create(). The created thread will then use that stack.

Modifying the currently used stack is much trickier, and is best done in the C runtime (in machine code, written in assembly) before the actual compiled C code is run in the process. In C, it can be done via getcontext()/setcontext(), by creating a new context (as if it was a new thread), setting up a new stack for it, switching to the new context, and then freeing the old stack.

In many cases, signal handlers are set to use a separate stack, by calling sigaltstack(). This is very useful, because then signals due to e.g. stack overflow can still be acted upon.

Finally, recall that in Linux, /proc/PID/maps describes all existing mappings for process PID. For the process itself, you can always use /proc/self/maps. You might find the following dump_maps() function useful when experimenting with this stuff:

#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>

/* Returns 0 if memory mappings printed to standard output,
   an errno error code if an error occurs.
*/
int dump_maps(void)
{
    FILE *in;
    int   ch;

    in = fopen("/proc/self/maps", "r");
    if (!in) {
        const int saved_errno = errno;
        fprintf(stderr, "Cannot open /proc/self/maps: %s.\n", strerror(saved_errno));
        return errno = saved_errno;
    }

    printf("  MinAddress-MaxAddress  Perms Offset  Device   Inode                    Pathname-or-Description\n");

    /* Yes, this is the slowest possible way to copy a file to standard output,
       but it should not matter for this use case.  The KISS principle. */
    while ((ch = getc(in)) != EOF)
        putchar(ch);

    putchar('\n');

    fclose(in);
    return 0;
}

int main(void)
{
    dump_maps();

    return EXIT_SUCCESS;
}

For further info on /proc/self/maps and other /proc pseudofiles –– they're not files in the sense of existing on any storage device; they are generated by the kernel as they are accessed, and are a very efficient interface for this kind of stuff ––, see man 5 proc.

  • Thanks for your detailed answer. Just trying to make sure I understand your answer: so `mremap()` is called when the main thread tries to call `mmap()` and the total size of mmap-ed region(s) is greater than what the kernel could tell at startup (by looking at some ELF header?) – HCSF Oct 23 '21 at 03:10
  • @HCSF: No. The kernel document suggests that programs MUST do an mremap()/mmap() call to convert their `MAP_GROWSDOWN` stacks to desired fixed size, preferably before execution of main() is started. (That is, in C library runtime code, crt*.o, or compiler provided code, e.g. libgcc*.a. I haven't checked if they do this already.) – Blabbo the Verbose Oct 23 '21 at 13:45
  • @HCSF: Without doing the above, the program has a `MAP_GROWSDOWN` stack for its initial thread by default. When the program uses so much stack that the stack guard page just below the stack is accessed, then the kernel internally extends the stack down by one page. You can think of it as the kernel doing a kind-of-an-mmap() call to make the page preceding the guard page into a new guard page, and if that succeeds, doing an mprotect() to convert the old guard page to a normal stack page; on behalf of the user process. This is what is meant by "implicit mremap()" in the part you quoted. – Blabbo the Verbose Oct 23 '21 at 13:50
  • I myself kinda-sorta disagree with the "MUST" in the quote, because I do not use local variables in functions that total to a page or more: I use dynamic memory management for arrays and buffers instead. This means that a `MAP_GROWSDOWN` auto-growing stack suits my needs, I do not suffer from its limitations, and since my processes don't use stack that much (only recursion, and I use data structures where recursion is limited anyway), my processes are not affected from the drawbacks of auto-growing stacks unless they're already running out of memory anyway. – Blabbo the Verbose Oct 23 '21 at 13:54
  • `MAP_GROWSDOWN` is unusable for thread stacks (and most other purposes) because it fails to stop other `mmap` calls from randomly picking a page just below the guard page, cutting off the room for future growth. See also [Analyzing memory mapping of a process with pmap. \[stack\]](https://stackoverflow.com/a/56920770) - the main thread's stack internally uses the `VM_GROWSDOWN` flag that `MAP_GROWSDOWN` sets, but it also has other magic that reserves space to grow into. – Peter Cordes Sep 29 '22 at 18:49