Why is linux kernel marking heap as mapped, when the program break has never been altered, libc is not linked, and heap should not exist?

Question

I've tried to reduce the code to something more minimal to demonstrate the problem.

BITS 64

global _start:function      
global BIG_BAD_BLOCK:data   

section .rodata     progbits  alloc   noexec  nowrite  align=4 
    hc_str_a: db "Example",   0x0

section .bss        nobits    alloc   noexec  write    align=4 
    personZ:  resb  20  ;
    personX:  resb  20  ;

section FAKE_HEAP   nobits    alloc   noexec  write    align=1
    NEXT_ADDR:      resq 1      ; pointer to the next available byte within the block
    BIG_BAD_BLOCK:  resb 204800 ; 200 KB chunk of memory


;   cpu instructions
section .text       progbits  alloc   exec    nowrite  align=16

    _start:                              ; start(argc, argv, envp)                               // the kernel calls _start() with the args provided by the execve() system call
        mov rdi, rsp                     ; (int*)          rdi         = rsp                     // argc is the first thing on the stack
        add rdi, 8                       ; (char**)        rdi         = (unsigned long) rdi + 8 // argv begins 8 bytes after argc
        mov ecx, dword [rsp]             ; (unsigned int)  ecx         = *((int*) rsp)           // argc : the kernel passes initial program arguments on the stack, rather than by registers
        mov eax, ecx                     ; (unsigned int)  eax         = ecx
        mov ebx, 8                       ; (unsigned int)  ebx         = 8
        mul ebx                          ; (unsigned int)  eax         = argc * 8                // how many bytes long is the argv array
        add eax, 8                       ; (unsigned int)  eax        += 8                       // byte length of argv + 8 byte offset for argv's trailing null
        add rax, rdi                     ; (char**)        rax         = (unsigned long) argv + eax
        mov rsi, rax                     ; (char**)        rsi         = envp                    //
        mov eax, ecx                     ; (unsigned int)  eax         = argc                    // 
        call init                        ; init(argc, argv, envp)     
        nop                              ;                                                       // ignore the do-nothing instruction
    init:
        push rax                         ;                                                       // save the register we are going to clobber (argc)
        push rdi                         ;                                                       // save the register we are going to clobber (argv)
        push rsi                         ;                                                       // save the register we are going to clobber (envp)
        push rbp                         ; (stackframe*)  (--rsp)      = (stackframe*) rbp       // save copy of old top-of-stack at the new top-of-stack 8 bytes down
        mov rbp, rsp                     ; (stackframe*)   rbp         = rsp                     // (this provides us a fixed pointer to the old top-of-stack)
        call init_heap                   ; init_heap()                                           // let's ignore this for now
        mov rax, qword [rbp - 24]        ; (unsigned int)  eax         = argc
        mov rdi, qword [rbp - 16]        ; (char**)        rdi         = argv
        mov rsi, qword [rbp -  8]        ; (char**)        rsi         = envp
        call main                        ; (unsigned int)  eax         = main(argc, argv, envp)
        call exit                        ; exit(eax)
    main:
        nop
        mov eax, 0                       ; (unsigned int)  eax         = 0
        ret                              ; return 0;

    init_heap:
        mov qword [NEXT_ADDR], BIG_BAD_BLOCK    ; NEXT_ADDR will start by pointing to the first byte of BIG_BAD_BLOCK

    malloc: ; malloc(byteCount)
        push qword [NEXT_ADDR]
        add qword [NEXT_ADDR], rax      ; NEXT_ADDR += byteCount
        pop rax                         ; rax = (void*) memoryChunk
        ret

    exit:               ; exit(statusCode)
        mov rdi, rax    ; rdi = (int) statusCode
        mov rax, 60     ; rax = (unsigned long int) 60   // system call #60 is SYS_exit
        syscall         ; SYS_exit(statusCode)           // tell kernel to kill this process



;   To assemble:
;   nasm -felf64 -gdwarf -o HeapProblem.o ./HeapProblem.asm

;   To link:
;   ld -o HeapProblem.bin HeapProblem.o

I assemble and link using the commands above. This is just 1 single assembly file. No includes. No macros. No libraries. Not even libC. Its just that 1 file you see there, assembled using the Netwide Assembler, and linked using the ld core utility. With only that 1 object file being processed by the linker.

This implies:

a traditional malloc is not being loaded.
No system calls to brk are being made.
No system calls to mmap are being made.

There is nothing happening except what you see in that 1 assembly file. No other code should be interacting with this binary in any way, except for the linux system kernel itself which will load the binary into memory when a shell invokes execve() on the path to the bin file

After assembling and linking the file, we go to execute/debug it.

gdb HeapProblem.bin

b *_start

run one simple test

info proc mappings

process 5423
Mapped address spaces:

          Start Addr           End Addr       Size     Offset objfile
            0x400000           0x401000     0x1000        0x0 /var/www/html/ASM/HeapProblem.bin
            0x600000           0x601000     0x1000        0x0 /var/www/html/ASM/HeapProblem.bin
            0x601000           0x633000    0x32000        0x0 [heap]
      0x7ffff7ffb000     0x7ffff7ffd000     0x2000        0x0 [vvar]
      0x7ffff7ffd000     0x7ffff7fff000     0x2000        0x0 [vdso]
      0x7ffffffde000     0x7ffffffff000    0x21000        0x0 [stack]
  0xffffffffff600000 0xffffffffff601000     0x1000        0x0 [vsyscall]

maintenance info sections

Exec file:
    `/var/www/html/ASM/HeapProblem.bin', file type elf64-x86-64.
 [0]     0x004000b0->0x00400124 at 0x000000b0: .text ALLOC LOAD READONLY CODE HAS_CONTENTS
 [1]     0x00400124->0x0040012c at 0x00000124: .rodata ALLOC LOAD READONLY DATA HAS_CONTENTS
 [2]     0x0060012c->0x00600158 at 0x0000012c: .bss ALLOC
 [3]     0x00600158->0x00632160 at 0x0000012c: FAKE_HEAP ALLOC
 [4]     0x00000000->0x00000030 at 0x0000012c: .debug_aranges READONLY HAS_CONTENTS
 [5]     0x00000000->0x00000053 at 0x0000015c: .debug_info READONLY HAS_CONTENTS
 [6]     0x00000000->0x0000001b at 0x000001af: .debug_abbrev READONLY HAS_CONTENTS
 [7]     0x00000000->0x00000066 at 0x000001ca: .debug_line READONLY HAS_CONTENTS

print & BIG_BAD_BLOCK

$1 = (<data variable, no debug info> *) 0x600160

So. The first mapping is to the binary itself for the .text and .rodata sections. Cool. That makes sense.

The second mapping is also to the binary, seemingly for the .bss and FAKE_HEAP sections. Which is also more-or-less what we expected. Though it should be noted that the second mapping is larger than what is needed for .bss, but not large enough to completely fit both .bss and FAKE_HEAP. It can only contain .bss and part of FAKE_HEAP.

Then we've got the 3rd mapping, marked as [heap].

I expected 1 of 2 things to happen:

A) The kernel would fail to recognize my FAKE_HEAP section as a true heap, and would simply include the entire thing in the same segment as .bss

OR

B) The kernel would recognize my FAKE_HEAP as being an unusual/non-standard section with attributes that are consistent with a heap, and would thus mark the entire section as a heap. With the mapping start and end addresses exactly matching the memory address onto which FAKE_HEAP was loaded, and its natural end-boundary.

What actually happened: The kernel seems to have recognized a heap that starts at an arbitrary point within my FAKE_HEAP. It does not align. My FAKE_HEAP starts at 0x600158, with BIG_BAD_BLOCK starting at 0x600160. The kernel says the heap starts at 0x601000. That is 3,752 bytes into my structure. Which does not make sense at all. There's no reason that the kernel should think the heap begins 3,752 bytes past the beginning of this structure.

So, finally, a restatement of the question(s):

Should the kernel be detecting the FAKE_HEAP section or BIG_BAD_BLOCK symbol as a heap at all?

If so, why does the start address not match up with either the section or symbol start address?

If not, why is a heap being detected at all?

How is this heap being detected?

I need to understand why this is happening. Because I cannot find a clear logical or programmatic reason for this behavior. I've been researching this problem for the past 12 hours straight and I cannot figure this out. I've been searching for issues in the assembly, the linker, and the kernel itself.

_"How is this heap being detected?"_ See _fs/proc/task_mmu.c_ which has `if (vma->vm_start <= mm->brk && vma->vm_end >= mm->start_brk) { name = "[heap]";` — Jester, Oct 05 '22 at 13:03
Your `_start` code could be a lot simpler, and I think you have RDI and RSI reversed. `int main(int argc /* EDI */, char *argv[] /* RSI */, char *envp[] /* RDX */)`. Or I guess you're passing args in different registers to your custom `init` function? But anyway, see zwol's answer on [How Get arguments value using inline assembly in C without Glibc?](https://stackoverflow.com/a/50261564) for a `mov`-load and two LEA instructions to get argc, argv, and envp into registers. At the very least, you should either use `shl` by 3 to scale by 8, or copy-and-multiply with `imul esi, ecx, 8`. But L — Peter Cordes, Oct 05 '22 at 13:11
It makes little sense to `call init` / `nop` / `init:`. Seems like you could just let execution fall into that `init` label, no need to push a return address or jump over a `nop` that didn't need to be there in the first place. You aren't going to return to there, I assume, or else execution would fall into `init:` but this time without a return address on the stack. You could call `init_heap` *before* setting up argc/argv/envp. If you do that last, you just need to load register args for `main`, not spill/reload. — Peter Cordes, Oct 05 '22 at 13:13
I probably should have mentioned this was a stripped down version of a larger example: https://www.lita.engineering/editor/?file=/ASM/Car.asm&mode=assembly_x86 I do realize there's some structural things that might seem 'to make little sense', but an effort was being made to somewhat simulate the whole `_start()` -> `libc_csu_init()`, `libc_start_main()` -> `main()` call flow (in spirit) but in a very reduced and simplified example. — Charles Lentz, Oct 05 '22 at 13:23
@Jester I am reviewing https://github.com/torvalds/linux/blob/64291f7db5bd8150a74ad2036f1037e6a0428df2/fs/proc/task_mmu.c#L332 but it may take me some time to fully understand what is going on here. This does look like it is directly related to how /proc/{pid}/maps determines the name of a mapping. But I am still left wondering why the mapping is being distinguished and recognized apart from the mapping for the `.bss` section, especially if it is still in the same segment (as determined by ELF Program Headers of type `LOAD` ). I do appreciate the lead, though. — Charles Lentz, Oct 05 '22 at 13:41
This seems to be related to something that gdb is doing. When I run the program without the debugger, inserting an infinite loop in `main` so that it doesn't exit immediately, and `cat /proc/PID/maps`, the `[heap]` indicator does not appear. But it does when the program is run under gdb. It's not clear what the difference might be, since the mappings look identical otherwise. — Nate Eldredge, Oct 05 '22 at 13:42
@NateEldredge That is extremely valuable information. I had not realized this. Thank you for sharing this, I will dig in. — Charles Lentz, Oct 05 '22 at 13:44
But I am curious why the question is important. The concept of "heap" is just an abstraction for programmers; at the level of the process and OS, there are only mappings. There obviously has to be an anonymous mapping of about that size at that address to accommodate your `BIG_BAD_BLOCK`, so the question just seems to be about the tag `[heap]` that the kernel includes in `/proc/PID/maps`. The tag seems to be included based on some heuristic that, as above, is mainly for the benefit of the programmer, and AFAIK has no effect on the actual behavior of the program. — Nate Eldredge, Oct 05 '22 at 13:47
In this case, for some reason, the heuristic is wrong. But I have trouble seeing why that should be of any concern beyond being a purely cosmetic issue that is only visible in development and debugging. If you do figure out what's going on, how will it be helpful, other than perhaps to file a very low-priority kernel bug report? — Nate Eldredge, Oct 05 '22 at 13:50
Everything in programming is important. Especially when you're working at a low level and **especially** when it concerns a fundamental. "Heap" may be an abstract concept. But we do have a real memory mapping behind it (even if it is being misidentified as to what it is). And that memory mapping still has an alignment issue I can't explain. Every detail is important. What if someone were using maps information programmatically as an input to more complex logic to perform a higher level task? Any form of misidentification at the kernel level will have effects that ripple through entire appstack — Charles Lentz, Oct 05 '22 at 14:16
Just because the code doesn't make any library calls does not mean that heap allocation services are unavailable, same is true for file system services. There is no scan of the code for direct syscalls that don't use the library functions. — Erik Eidt, Oct 05 '22 at 15:09
What's the alignment issue that you talk about? As far as I can tell, what's there makes sense. You've used very small `align` values, so the linker can more or less do what it wants, as long as it can allocate segments as needed by your sections. — Thomas Jager, Oct 05 '22 at 15:27
The "heap" in this sense is just the break, as manipulated by the `brk` system call. See the Linux man page for differences between the kernel interface vs. the POSIX library function. I haven't checked in detail, but my understanding is that it effectively just makes the BSS bigger, expanding that mapping. I find it a real stretch to say that this tag in /proc/PID/maps has any significant chance of mattering for correctness of any sane program; the mapping itself still works. — Peter Cordes, Oct 05 '22 at 15:28
@NateEldredge By default GDB disables address space layout which result in the break being at the highest address mapped by the executable. If randomisation is active the break is a random amount higher, and therefore does not match the mapping in question. — Timothy Baldwin, Oct 05 '22 at 15:45

Why is linux kernel marking heap as mapped, when the program break has never been altered, libc is not linked, and heap should not exist?

0 Answers0