9

I tried to create MAP_GROWSDOWN mapping with the expectation it would grow automatically. As specified in the manual page:

MAP_GROWSDOWN

This flag is used for stacks. It indicates to the kernel virtual memory system that the mapping should extend downward in memory. The return address is one page lower than the memory area that is actually created in the process's virtual address space. Touching an address in the "guard" page below the mapping will cause the mapping to grow by a page. This growth can be repeated until the mapping grows to within a page of the high end of the next lower mapping, at which point touching the "guard" page will result in a SIGSEGV signal.

So I wrote the following example to test the mapping growing:

#ifndef _GNU_SOURCE
    #define _GNU_SOURCE
#endif
#include <stdlib.h>
#include <string.h>
#include <inttypes.h>
#include <errno.h>
#include <sys/mman.h>
#include <stdio.h>

int main(void){
    char *mapped_ptr = mmap(NULL, 4096,
                            PROT_READ | PROT_WRITE,
                            MAP_ANONYMOUS | MAP_PRIVATE | MAP_STACK | MAP_GROWSDOWN,
                            -1, 0);
    if(mapped_ptr == MAP_FAILED){
        int error_code = errno;
        fprintf(stderr, "Cannot do MAP_FIXED mapping."
                        "Error code = %d, details = %s\n", error_code, strerror(error_code));
                        exit(EXIT_FAILURE);
    }
    volatile char *c_ptr_1 = mapped_ptr; //address returned by mmap
    *c_ptr_1 = 'a'; //fine

    volatile char *c_ptr_2 = mapped_ptr - 4095; //1 page below the guard
    *c_ptr_2 = 'b'; //crashes with SEGV
}

So I got SEGV instead of growing the mapping. What does it mean by growing here?

St.Antario
  • 26,175
  • 41
  • 130
  • 318

3 Answers3

8

First of all, you don't want MAP_GROWSDOWN, and it's not how the main thread stack works. Analyzing memory mapping of a process with pmap. [stack] Nothing uses it, and pretty much nothing should use it. The stuff in the man page saying it's "used for stacks" is wrong and should be fixed.

I suspect it might be buggy (because nothing uses it so usually nobody cares or even notices if it breaks.)


Your code works for me if I change the mmap call to map more than 1 page. Specifically, I tried 4096 * 100. I'm running Linux 5.0.1 (Arch Linux) on bare metal (Skylake).

/proc/PID/smaps does show a gd flag.

And then (when single-stepping the asm) the maps entry does actually change to a lower start address but the same end address, so it is literally growing downward when I start with a 400k mapping. This gives a 400k initial allocation above the return address, which grows to 404kiB when the program runs. (The size for a _GROWSDOWN mapping is not the growth limit or anything like that.)

https://bugs.centos.org/view.php?id=4767 may be related; something changed between kernel versions in CentOS 5.3 and 5.5. And/or it had something to do with working in a VM (5.3) vs. not growing and faulting on bare metal (5.5).


I simplified the C to use ptr[-4095] etc:

int main(void){
    volatile char *ptr = mmap(NULL, 4096*100,
                            PROT_READ | PROT_WRITE,
                            MAP_ANONYMOUS | MAP_PRIVATE | MAP_STACK | MAP_GROWSDOWN,
                            -1, 0);
    if(ptr == MAP_FAILED){
        int error_code = errno;
        fprintf(stderr, "Cannot do MAP_FIXED mapping."
                        "Error code = %d, details = %s\n", error_code, strerror(error_code));
                        exit(EXIT_FAILURE);
    }

    ptr[0] = 'a';      //address returned by mmap
    ptr[-4095] = 'b';  // grow by 1 page
}

Compiling with gcc -Og gives asm that's nice-ish to single-step.


BTW, various rumours about the flag having been removed from glibc are obviously wrong. This source does compile, and it's clear that it's also supported by the kernel, not silently ignored. (Although the behaviour I see with size 4096 instead of 400kiB is exactly consistent with the flag being silently ignored. However the gd VmFlag is still there in smaps, so it's not ignored at that stage.)

I checked and there was room for it to grow without coming close to another mapping. So IDK why it didn't grow when the GD mapping was only 1 page. I tried a couple times and it segfaulted each time. With the larger initial mapping it never faulted.

Both times were with a store to the mmap return value (the first page of the mapping proper), then a store 4095 bytes below that.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
8

I know the OP has already accepted one of the answers, but unfortunately it does not explain why MAP_GROWSDOWN seems to work sometimes. Since this Stack Overflow question is one of the first hits in search engines, let me add my answer for others.

The documentation of MAP_GROWSDOWN needs updating. In particular:

This growth can be repeated until the mapping grows to within a page of the high end of the next lower mapping, at which point touching the "guard" page will result in a SIGSEGV signal.

In reality, the kernel does not allow a MAP_GROWSDOWN mapping to grow closer than stack_guard_gap pages away from the preceding mapping. The default value is 256, but it can be overridden on the kernel command line. Since your code does not specify any desired address for the mapping, the kernel chooses one automatically, but is quite likely to end up within 256 pages from the end of an existing mapping.

EDIT:

Additionally, kernels before v5.0 deny access to an address which is more than 64k+256 bytes below stack pointer. See this kernel commit for details.

This program works on x86 even with pre-5.0 kernels:

#include <sys/mman.h>
#include <stdint.h>
#include <stdio.h>

#define PAGE_SIZE   4096UL
#define GAP     512 * PAGE_SIZE

static void print_maps(void)
{
    FILE *f = fopen("/proc/self/maps", "r");
    if (f) {
        char buf[1024];
        size_t sz;
        while ( (sz = fread(buf, 1, sizeof buf, f)) > 0)
            fwrite(buf, 1, sz, stdout);
        fclose(f);
    }
}

int main()
{
    char *p;
    void *stack_ptr;

    /* Choose an address well below the default process stack. */
    asm volatile ("mov  %%rsp,%[sp]"
        : [sp] "=g" (stack_ptr));
    stack_ptr -= (intptr_t)stack_ptr & (PAGE_SIZE - 1);
    stack_ptr -= GAP;
    printf("Ask for a page at %p\n", stack_ptr);
    p = mmap(stack_ptr, PAGE_SIZE, PROT_READ | PROT_WRITE,
         MAP_PRIVATE | MAP_STACK | MAP_ANONYMOUS | MAP_GROWSDOWN,
         -1, 0);
    printf("Mapped at %p\n", p);
    print_maps();
    getchar();

    /* One page is already mapped: stack pointer does not matter. */
    *p = 'A';
    printf("Set content of that page to \"%s\"\n", p);
    print_maps();
    getchar();

    /* Expand down by one page. */
    asm volatile (
        "mov  %%rsp,%[sp]"  "\n\t"
        "mov  %[ptr],%%rsp" "\n\t"
        "movb $'B',-1(%%rsp)"   "\n\t"
        "mov  %[sp],%%rsp"
        : [sp] "+&g" (stack_ptr)
        : [ptr] "g" (p)
        : "memory");
    printf("Set end of guard page to \"%s\"\n", p - 1);
    print_maps();
    getchar();

    return 0;
}
Petr Tesařík
  • 142
  • 1
  • 5
3

Replace:

volatile char *c_ptr_1 = mapped_ptr - 4096; //1 page below

With

volatile char *c_ptr_1 = mapped_ptr;

Because:

The return address is one page lower than the memory area that is actually created in the process's virtual address space. Touching an address in the "guard" page below the mapping will cause the mapping to grow by a page.

Note that I tested the solution and it works as expected on kernel 4.15.0-45-generic.

Maxim Egorushkin
  • 131,725
  • 17
  • 180
  • 271
  • 2
    did you test if this works? I couldn't get it to grow anyway, even if I successfully read `c_ptr_1[0]` and it returns 0 and I can set it, reads, writes to [-1] *will* sigsegv. – Antti Haapala -- Слава Україні Jul 04 '19 at 13:34
  • 1
    OK then kernel version might matter! 4.18.0-24-generic x86_64 Ubuntu here. – Antti Haapala -- Слава Україні Jul 04 '19 at 13:40
  • @AnttiHaapala Added a kernel version for you. – Maxim Egorushkin Jul 04 '19 at 13:40
  • I had it fail on 4.15.0-50-generic x86_64 Ubuntu 18.04. – Thomas Jager Jul 04 '19 at 13:52
  • @ThomasJager I cannot confirm or deny your observations. – Maxim Egorushkin Jul 04 '19 at 14:00
  • Can you explain why you removed `volatile`? – St.Antario Jul 04 '19 at 14:18
  • 1
    @St.Antario Added back. I compiled your code with no optimizations, so `volatile` wasn't necessary for me. But it works with `volatile` just as well. – Maxim Egorushkin Jul 04 '19 at 14:23
  • @St.Antario please edit the question so that there are 2 accesses and the first access works and the second does not and then Maxim can verify it / Maxim: did you try accessing another page too? – Antti Haapala -- Слава Україні Jul 04 '19 at 14:59
  • @AnttiHaapala Added the example when accessing 1 page below the guard fails. – St.Antario Jul 04 '19 at 15:06
  • @St.Antario `mapped_ptr - 4095` is not supposed to work at all. This is what my answer is trying to say. – Maxim Egorushkin Jul 04 '19 at 15:09
  • 2
    @MaximEgorushkin So there is only 1 guard page created on the `mmap` with `MAP_GROWSDOWN`. I thought as soon as we touch page that is right below the guard it should be reserved. There is _This growth can be repeated until the mapping grows to within a page of the high end of the next lower mapping_ part about repeating the grow... – St.Antario Jul 04 '19 at 15:20
  • @AnttiHaapala and Maxim: Yeah the OP's updated code doesn't work for me. And the returned pointer = the bottom of the mapping according to `/proc/PID/maps`, not one page below like the man page claims. It's probably obsolete; `MAP_GROWSDOWN` might be broken or maybe finally removed in my 5.0.1 (Arch Linux) kernel. https://lwn.net/Articles/294001/ (from 2008) says it's not usable, and should be deprecated. The behaviour I see on Linux 5.0.1 is consistent with `MAP_GROWSDOWN` being silently ignored. The man page saying "used for stacks" is a total joke; it isn't. – Peter Cordes Jul 07 '19 at 08:54
  • @AnttiHaapala: Update: I'm sure it's not fully removed from the kernel. `/proc/PID/smaps` does show a `gd` flag. But it may only work for mappings that start larger than 1 page? The code in the question works for me after changing `4096` to `100*4096`. And the `maps` entry does actually change to a lower start address but the same end address, so it is literally growing downward when I start with a 400k mapping. https://bugs.centos.org/view.php?id=4767 may be related. – Peter Cordes Jul 07 '19 at 09:47
  • 1
    But contrary to the man page, the `mmap` return value *was* the start address of the mapping in `maps` – Peter Cordes Jul 07 '19 at 09:48
  • @PeterCordes hmm that could be an explanation – Antti Haapala -- Слава Україні Jul 07 '19 at 10:35