2

I have written the following C code: enter image description here

It simply allocates an array of 1000000 integers and another integer, and sets the first integer of the array to 0

I compiled this using gcc -g test.c -o test -fno-stack-protector

It gives a very weird disassembly:enter image description here

Apparently it keeps allocating 4096 bytes on the stack in a loop, and "or"s every 4096th byte with 0 and then once it reaches 3997696 bytes, it then further allocates 2184 bytes. It then proceeds to set the 4000000th byte (which was never allocated) to 5.

Why doesn't it allocate the full 4000004 bytes that were requested? Why does it "or" every 4096th byte with 0, which is a useless instruction?

Am I understanding something wrong here?

NOTE: This was compiled with gcc version 9.3. gcc version 7.4 does not do the loop and "or" every 4096th byte with 0, but it does allocate only 3997696+2184=3999880 bytes but still sets the 4000000th byte to 5

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Rwitaban Goswami
  • 427
  • 3
  • 13
  • Enable optimizations. Also it's not allocating anything, it's moving the stack pointer around. – Blindy Aug 13 '20 at 19:42
  • Can't reproduce with gcc 10 or clang 10 – JCWasmx86 Aug 13 '20 at 19:43
  • 1
    Good luck with that huge uninitialised buffer, on the stack. – Weather Vane Aug 13 '20 at 19:43
  • I don't want optimizations. I want to understand the disassembly. Please understand that I'm not writing any code to make a program. This is part of a course to understand assembly language. – Rwitaban Goswami Aug 13 '20 at 19:54
  • @Blindy isn't allocating any memory on the stack just moving the stack pointer? – Rwitaban Goswami Aug 13 '20 at 19:56
  • There's no such thing as "allocating memory" on the stack, it's already allocated, it's why you can write in any part of it just fine. It's also why it's limited in size at thread creation time, because at that point it actually *is* being allocated. – Blindy Aug 13 '20 at 20:07
  • 1
    "proceeds to set the 4000000th byte" I think it actually sets the first dword. `movl` is a dword, not byte. The high address displacement is negative, so this actually accesses the first array entry, not the last. – ecm Aug 13 '20 at 20:10
  • 2
    Don't post pictures of text; copy/paste into a code-formatting block. Also, describe what OS / distro you're on; apparently they're configuring GCC with `-fstack-check` on by default, but not all distros do that (yet). – Peter Cordes Aug 13 '20 at 20:45
  • The OS is Ubuntu-20.04 on WSL2. I would have copy pasted it on a code formatting block but my PC has run into issues and is not booting, and the screenshots were all I had. – Rwitaban Goswami Aug 13 '20 at 21:02
  • 1
    I have downvoted your post because you posted pictured of text. Replace the pictures by text and I'll retract my downvote. – fuz Aug 13 '20 at 21:52

3 Answers3

10

This is a mitigation for the Stack Clash class of vulnerabilities, known since the 90s or earlier but only widely publicized in 2017. (See stack-clash.txt and this blog entry.)

If the attacker can arrange for a function with a VLA of attacker-controlled size to execute, or can arrange for a function with a large fixed-size array to execute when the attacker controls the amount of stack already used in some other way, they can cause the stack pointer to be adjusted to point into the middle of other memory, and thereby cause the function to clobber said memory, usually leading to arbitrary code execution.

The machine code GCC has emitted here is part of the Stack Clash Protection feature. It mitigates the risk by (roughly), whenever adjusting the stack pointer by more than the minimum page size, moving it incrementally by one minimum-page-sized unit at a time and accessing memory after each adjustment. This ensures that, if at least one guard page (page mapped PROT_NONE) is present, the access will fault and generate a signal before the adjustment into unrelated memory is made. The main thread always has guard pages, and by default newly created ones do too (and the size can be configured in the pthread thread creation attributes).

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • 3
    [Linux process stack overrun by local variables (stack guarding)](https://stackoverflow.com/q/60058873) has some more info about Linux stack allocation. and this checking. It looks like this code was compiled with `gcc -fstack-check`, apparently on by default in the OP's distro version of GCC. – Peter Cordes Aug 13 '20 at 20:43
5

There are two things here:

  • the "no-op" ORs read and write to each page on stack. These are required because the stack is usually mapped so that there is a guard page/pages below the stack. When the guard page is touched the stack is expanded down. But if you touch the memory below the guard page a SIGSEGV would happen.

  • the x86-64 System-V ABI specifies a 128-byte red zone below the stack pointer. This area can be freely used by the compiler to store local variables too. If you add 128 to 3997696 you'll get 4000008. Note that the stack will always have to be at least aligned to 8, not 4, so that any int64_t or double would be aligned (as noted by Peter Cordes, larger arrays need to be 16-byte-aligned, hence the requirement for the entire stack to be 16-byte aligned too), so 40000004 would be plain wrong!

  • 1) That is really interesting. Is there a compiler directive to stop gcc from doing this? 2) That is really interesting, I did not know that – Rwitaban Goswami Aug 13 '20 at 20:22
  • 1
    @RwitabanGoswami: If you're on Linux, this doesn't explain the `or` because that's not how Linux's stack-growth mechanism works. Linux allows growing the stack by any number of pages (up to `ulimit -s`) as long as RSP has moved first. This looks like your GCC has `-fstack-check` on by default; see [Linux process stack overrun by local variables (stack guarding)](https://stackoverflow.com/q/60058873) for that and Linux stack growth in general. – Peter Cordes Aug 13 '20 at 20:35
  • 1
    BTW, the x86-64 SysV ABI requires that local arrays larger than 16 bytes be 16-byte aligned. With `push %rbp` realigning the stack by 16 on function entry, the allocation size will be a multiple of 16. – Peter Cordes Aug 13 '20 at 20:37
  • @PeterCordes thanks, I didn't remember the exact specifics, hence "at least", updated. – Antti Haapala -- Слава Україні Aug 14 '20 at 04:28
1

I had the same issue and the only flag that can disable this "weird" assembly code is -fno-stack-clash-protection

szumi
  • 11
  • 1