0

I'm running some experiments with stack and the following got me stuck.

It can be seen that Linux has initial [stack] mapping 132KiB in size. In case of ulimit -s unlimited we can expand the stack any further if we adjust rsp accordingly. So I set ulimit -s unlimited and ran the following program:

PAGE_SIZE     equ 0x1000

;mmap staff
PROT_READ     equ 0x01
PROT_WRITE    equ 0x02
MAP_ANONYMOUS equ 0x20
MAP_PRIVATE   equ 0x02
MAP_FIXED     equ 0x10

;syscall numbers
SYS_mmap      equ 0x09
SYS_exit      equ 0x3c

section .text

global _start

_start:
    ; page alignment
    and rsp, -0x1000

    ; call mmap 0x101 pages below the rsp with fixed mapping
    mov rax, SYS_mmap
    lea rdi, [rsp - 0x101 * PAGE_SIZE]
    mov rsi, PAGE_SIZE
    mov rdx, PROT_READ | PROT_WRITE
    mov r10, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED
    mov r8, -1
    mov r9, 0
    syscall

    sub rsp, 0x80 * PAGE_SIZE
    mov qword [rsp], -1 ; SEGV

    mov rax, SYS_exit
    mov rdi, 0
    syscall

Even in spite of adjusting the rsp it segfaults anyway. I don't really get the point. I manually created a fixed mapping at the address rsp - 0x101 * PAGE_SIZE 101 pages below the rsp.

My expectation was that it would not interfere with expanding the stack (rsp - 0x80 in my case) till we hit the fixed mapping rsp - 0x101 * PAGE_SIZE.

Btw, If I remove MAP_FIXED from the mapping it is not honored and no segfault occurs (as expected). Here is the strace output:

mmap(0x7ffe4e0fe000, 4096, PROT_READ|PROT_WRITE, 
     MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x1526e3f3a000

But MAP_FIXED does the job:

mmap(0x7ffd8979c000, 4096, PROT_READ|PROT_WRITE, 
     MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7ffd8979c000

UPD: The segfault is not triggered if lea rdi, [rsp - 0x101 * PAGE_SIZE] is replaced with lea rdi, [rsp - 0x200 * PAGE_SIZE].

St.Antario
  • 26,175
  • 41
  • 130
  • 318
  • The red zone has nothing to do with this. The red zone is simply the 128 bytes below RSP and has nothing to do with growing the stack. The red zone exists as a space that can be used by programs without worrying about it being clobbered by a signal handler etc. You need to get away from thinking the red zone plays a part in this. – Michael Petch Jul 07 '19 at 18:01
  • 1
    Linux uses the concept of guard pages on the stack. This may be of some interest: https://lkml.org/lkml/2017/6/22/345 – Michael Petch Jul 07 '19 at 18:03
  • @MichaelPetch I was thought that we are only safe to touch the memory within the red zone. In another case we have to adjust the `rsp` as specified here https://stackoverflow.com/a/56718182/2786156 – St.Antario Jul 07 '19 at 18:04
  • No, the redzone is simply a place you can store temporary data without it being clobbered. The thing the redzone buys you is that you don't have to use instructions like `sub rsp, ##` to reserve space for local variables (as long as they take up no more than 128 bytes below RSP) and your function doesn't call other functions.. If you need more space then you adjust RSP to account for it. You can access memory beyond that but you have to concern yourself with Linux guard pages. – Michael Petch Jul 07 '19 at 18:05
  • @MichaelPetch That's interesting. So the initial 132KiB mapping is sort of guard pages for the `[stack]`? – St.Antario Jul 07 '19 at 18:08
  • 1
    Presumably the OS tries to grow the stack in some fixed increment which simply does not fit between your 0x80 and 0x101 pages but does fit into the gap if you use 0x200. – Jester Jul 07 '19 at 18:13
  • @Jester I tried to sub `rsp, 0x1000 * PAGE_SIZE` that is below the fixed mapping, but segfault also occurred. No manually created mappings exist below. – St.Antario Jul 07 '19 at 18:21
  • 2
    You can't grow the stack **through** the fixed mapping for obvious reasons. – Jester Jul 07 '19 at 18:22
  • The interesting thing to try would be to see if the kernel treats that space as reserved, using `MAP_FIXED_NOREPLACE` or just a normal non-NULL hint address without either FIXED flag. (`mmap` won't randomly pick it, and it might be reserved with a mechanism that stops it from being allocated without `MAP_FIXED`) – Peter Cordes Jul 07 '19 at 21:06
  • @PeterCordes I did not find the `MAP_FIXED_NOREPLACE` macro declaration in the `sys/mman.h` neither in the `man mmap`. The only hint was specified [here](https://elixir.bootlin.com/linux/latest/source/include/uapi/asm-generic/mman-common.h#L29). I use kernel 4.18.0. Tying to manually add the flag to `mmap` arguments also resulted in segfault. – St.Antario Jul 08 '19 at 07:41
  • 1
    @St.Antario: then your user-space glibc is too old, but your kernel is new enough. You can just define it yourself because the `mmap` function in glibc is just a thin wrapper for the system call; it doesn't need to understand the flags. – Peter Cordes Jul 08 '19 at 07:47

1 Answers1

5

Linux kernel enforces a gap between the stack and other mappings. If that gap can not be maintained then the stack will not grow.

Relevant source code in mm/mmap.c, from line 2498

/* enforced gap between the expanding stack and other mappings. */
unsigned long stack_guard_gap = 256UL<<PAGE_SHIFT;

static int __init cmdline_parse_stack_guard_gap(char *p)
{
    unsigned long val;
    char *endptr;

    val = simple_strtoul(p, &endptr, 10);
    if (!*endptr)
        stack_guard_gap = val << PAGE_SHIFT;

    return 0;
}
__setup("stack_guard_gap=", cmdline_parse_stack_guard_gap);

and line 2424:

int expand_downwards(struct vm_area_struct *vma,
                   unsigned long address)
{
    struct mm_struct *mm = vma->vm_mm;
    struct vm_area_struct *prev;
    int error = 0;

    address &= PAGE_MASK;
    if (address < mmap_min_addr)
        return -EPERM;

    /* Enforce stack_guard_gap */
    prev = vma->vm_prev;
    /* Check that both stack segments have the same anon_vma? */
    if (prev && !(prev->vm_flags & VM_GROWSDOWN) &&
            (prev->vm_flags & (VM_WRITE|VM_READ|VM_EXEC))) {
        if (address - prev->vm_end < stack_guard_gap)
            return -ENOMEM;
    }

You can see it's adjustable via kernel parameter but the default is 256. Thus this gap does not fit between 0x80 and 0x101 pages, but does fit if you use 0x200.

Jester
  • 56,577
  • 4
  • 81
  • 125