address operations for eax and rax have different behavior for my allocation function

Question

i have my allocation function:

malloc_:
   pushq   %rbp
   movq    %rsp, %rbp

   mov      %rdi, %rcx   # store size
   movl    $9, %eax              # system call 9 sys_mmap
   movq    $0, %rdi              # start address
   movq    %rcx, %rsi  # size
   movl    $3, %edx              # page flags <--- PROT_READ |  PROT_WRITE
   mov     $34, %r10             # mem flags <---- MAP_PRIVATE | MAP_ANONYMOUS
   movl    $-1, %r8d             # file descriptor
   movl    $0, %r9d              # offset
   syscall

   cmp     $0, %rax
   jg .L1.malloc_exit_
   mov     $0, %rax
   .L1.malloc_exit_:
   popq    %rbp
   retq

.globl main
main:
    pushq   %rbp
    movq    %rsp, %rbp

    mov $512, %rdi
    call malloc_
    cmp $0, %rax
    je exit
    // movl (%eax), %edx // <---------crash
    mov (%rax), %rdx     // <---------norm
    exit:

I showed an incomprehensible place for me. Why is this happening?

If I use system malloc, then in both cases everything is fine

You are simply getting an address that does not happen to fit into 32 bits. I have not decoded your `mmap` arguments and you unfortunately have not commented it, but there is a `MAP_32BIT` flag you can use to request low memory. Not recommended and you should generally use 64 bit pointers unless you are sure your address fits into 32 bits. — Jester, Jan 04 '22 at 11:36
The fix is to use `rax`. You are on a 64 bit system, pointers are generally 64 bits. — Jester, Jan 04 '22 at 11:39
That's weird, what system are you on? Normally glibc `malloc` *does* return memory outside the low 32 bits. Use `strace ./a.out` and `ltrace ./a.out` to see what system calls happen in your program. Maybe make a dummy ENOSYS call with a high RAX so you can find the top of main easily in strace output. Anyway, don't override the address-size to 32-bit unless you specifically *want* to truncate pointers (like for the x32 ABI); 64-bit is the default in machine code so it's most efficient (smaller code size). — Peter Cordes, Jan 04 '22 at 11:48
@PeterCordes i used 32-bit registers because i cannot write this mov $0, (%rdi), just movl $0, (%edi) in other my code — xperious, Jan 04 '22 at 11:52
On my system, x86-64 Arch GNU/Linux, replacing `call malloc_` with `call malloc` and running `strace ./a.out` produced the expected segfault. `strace` shows: `brk(NULL) = 0x56249128e000` / `brk(0x5624912af000) = 0x5624912af000` / `--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} ---`. (glibc malloc uses `brk` for small allocations, and the kernel brk system call returns the current break. glibc calls once at startup to find the current break, then with non-NULL to request a new break.) — Peter Cordes, Jan 04 '22 at 11:53
`movl $0, (%rdi)` assembles just fine: 32-bit operand-size, 64-bit address-size. Those are separate things. `movq $0, (%rdi)` is also fine if you want to do a 64-bit store. Of course `mov $0, (%anything)` won't assemble because the operand-size is ambiguous; it could be a byte, word, dword, or qword store. If you have existing source code written for 32-bit mode, you're going to have to change the registers to port it to x86-64. — Peter Cordes, Jan 04 '22 at 11:55
@PeterCordes system `malloc` works if you disable PIE because then the break is in the low addresses. — Jester, Jan 04 '22 at 11:55
@Jester: Oh interesting, yeah it does, with an initial break of `0x1010000`. (After I added a `pop %rbp` / `ret` instead of falling off the end of `main` :P). That's not particularly close to the .data or .bss sections; I wonder why Linux chose that? — Peter Cordes, Jan 04 '22 at 11:57
@PeterCordes i show code in glibc malloc wrapper... He hasn't logic for brk, just call mmap. — xperious, Jan 04 '22 at 11:58
@xperious if you insist on using 32 bit pointers, pass the `MAP_32BIT` flag as I already told you. — Jester, Jan 04 '22 at 11:59
If you use `(%eax)` and `(%edi)` then you **do** need 32 bit pointers. Those **are** 32 bit registers. — Jester, Jan 04 '22 at 12:01
@Jester: Yeah, but I think they've realized they were mixed up about operand-size vs. address-size, and are now able to use 64-bit addresses everywhere like a normal person, even with `movl`, after seeing my comment about that. — Peter Cordes, Jan 04 '22 at 12:02
@xperious: Unless glibc changed in the last 6 months or so, you're missing something. My `strace` output proves that `call malloc` resulted in a `brk` system call. I'm using Arch Linux's glibc 2.33-5 binary package. glibc's default tuning is to use brk for small allocations, mmap for large ones. [In malloc, why use brk at all? Why not just use mmap?](https://stackoverflow.com/a/56629384), and [How/where is sbrk used within malloc.c?](https://stackoverflow.com/q/20863330). Also https://sourceware.org/glibc/wiki/MallocInternals mentions use of `brk`. — Peter Cordes, Jan 04 '22 at 12:16
@PeterCordes i was wrong, i see musl sources - musl have logic for brk — xperious, Jan 04 '22 at 12:24
You're using MUSL as your system's `/lib/libc.so`, or otherwise linking your test program against it? That's very unusual, all mainstream GNU/Linux distros use glibc. MUSL sources are of course only relevant to explain / understand the use of `brk` you saw (or whatever other cause of 32-bit addresses) if you're actually using MUSL. — Peter Cordes, Jan 04 '22 at 12:28

score 1 · Answer 1 · answered Jan 04 '22 at 13:15

1

In your code, this:

   syscall

   cmp     $0, %rax
   jg .L1.malloc_exit_

Is not right. It should be:

    syscall

    cmp    0xfffffffffffff000, %rax
    jbe    .L1.malloc_exit_

This is typical of UNIX system calls; they return -1 to indicate an error, and the thunk is responsible for updating errno, or whatever your c-bindings look like. If you want to understand system calls, it is often informative to write a small c program that uses them, then step into the thunk with gdb to see what it does. Or get the source.

answered Jan 04 '22 at 13:15

mevets

10,070
1
21
33

Yes, look at the code. – mevets Jan 04 '22 at 13:40
Anyway, with some tweaks this answer could be a useful code-review, just you should be clear you're not answering the primary question about the segfault (from truncating a pointer). Also, for malloc specifically, on x86-64, it's likely safe to assume that Linux will continue to use a 50:50 split with memory above the non-canonical hole being kernel addresses, so there's no way x86-64 mmap can return a pointer to a signed-negative address. (I don't think you can remap the VDSO pages...) – Peter Cordes Jan 04 '22 at 13:42
Comparing against 0 with `test %rax,%rax` / `jl .Lerror` / `ret` is thus a fun and useful optimization, *if* done intentionally and commented to note that the actual error-return range is unsigned `>= -4095ULL` or `> -4096ULL`. – Peter Cordes Jan 04 '22 at 13:44
Oh, right, `jbe .L1.malloc_exit_` is strangely written to jump on success, so you're correctly leaving only `rax > -4096ULL` as errors. (Or you would be if you were comparing against an immediate, instead of contents of memory at that absolute address!) I was expecting fall-through (to a `ret`) on success because that's more efficient, and was just looking at the `e` in the branch condition, missed the `b` instead of `a`. My bad. – Peter Cordes Jan 04 '22 at 13:48
No, that is encoding configuration specific information into an application. There is no reason that a pointer [2^63..2^64-4096], or a pointer 0, is not a valid return from a generic memory function. The obsession with 0 is an ansi-c problem; and it is merely one convention that user addresses occupy the lower half; many unices expanded the user address range into the traditional kernel one. – mevets Jan 04 '22 at 13:51
But x86-64 Linux `mmap` *isn't* a generic memory function. Anyway, notice that I used `jl error` to jump on error, so I'm considering `0` a valid return value because it's non-negative. Because it is, when [the `mmap_min_addr` sysctl](https://wiki.debian.org/mmap_min_addr) is set to zero instead of the default 64k. – Peter Cordes Jan 04 '22 at 13:57
Also note I didn't say it was certain or guaranteed that future x86-64 Linux wouldn't ever get patched with a 1:3 kernel:user split, but things have changed a lot since hitting the limits of 32-bit mode. 47 or 56 bits of user-space addresses is enough for most processes. Some HPC workloads may want all the RAM in a single process. But otherwise, having boatloads of memory for caching and running many processes is highly useful. You want lots of kernel addr space to make that efficient on Linux; it likes having at least 2x RAM of kern vaddr space. – Peter Cordes Jan 04 '22 at 14:02
I was also thinking that x86-64's non-canonical hole in the address space is an extra nudge in the direction of an even split that didn't exist in 32-bit mode, but you're right that it's of course not a showstopper. But with PML5 giving us 57-bit virt addresses, and physical capped at 52 by the current page-table format, we have more virt addr space than phys, thus little reason to mess with the user/kernel split to allow even more for user-space. Anyway, these are just my thoughts on why it's *likely* safe enough, at least for a toy program. I'd think harder about long-lived binaries! – Peter Cordes Jan 04 '22 at 14:05
@Peter, LAM formalizes the convention of using the upper half of the address space for kernel and makes it part of the CPU architecture. – prl Jan 04 '22 at 18:32
@prl: Oh neat, I hadn't seen that Linear Address Masking [upcoming feature](https://software.intel.com/content/dam/develop/external/us/en/documents-tps/architecture-instruction-set-extensions-programming-reference.pdf). An OS that used a different split could simply not enable LAM, or enable it for both "user" and "supervisor" so HW would sign-extend based on bit #47 or #56. The pointer sign-extension rules don't actually interact with paging / privilege levels, AFAICT. But in terms of terminology, yes, Intel is using "user" to describe bit 63 clear, "supervisor" for bit 63 set. – Peter Cordes Jan 04 '22 at 22:40

address operations for eax and rax have different behavior for my allocation function

1 Answers1