Why does a write system call print a bunch of junk when you mov edx, [msgLen] from db $-msg - same as with the address?

Question

For some reason [msgLen] and msgLen produce the same result. I know that I ought to use equ or something, but I want to know why this isn't working. Is NASM ignoring the dereference? It prints my string and a bunch of junk bc it thinks the string is bigger than it is right? Thanks!

Please see the assembly below:

_start:
    mov edx,[msgLen]
    mov ecx,msg
    mov ebx,1
    mov eax,4
    int 0x80
    mov eax,1
    int 0x80

section .data
    msg db 'Zoom',0xA
    msgLen db $-msg

Well if you don't want to use `EQU` (which you should do) then `msgLen db $-msg ` would have to be `msgLen dd $-msg` since you read a full dword into EDX. I asusme you are on Linux, what garbage does yours print? — Michael Petch, Dec 24 '20 at 19:30
"Zoom .shstrtab.text.data @ @ " , Yes I'm on Ubuntu. Oh ok so it is expecting a two byte int, but i'm only giving one byte and whatever is in the byte next to it? — Max Brenner, Dec 24 '20 at 19:37
`mov edx,[msgLen]` takes the 4 byte value at `msgLen` and moves it to EDX. You only allocated `msgLen` as a 1 byte value (`db`). which is why you want to change `msgLen db` to `msgLen dd` (`dd` is a 32 bit DWORD value). When you used `db` the extra 3 bytes after `msgLen` will be used to form a 32-bit value and I suspect in your environment they are non-zero. Most people though would have used `EQU` to create an assemble time constant with `msgLen EQU $-msg` and then done `mov edx,msgLen` to move that constant to EDX. — Michael Petch, Dec 24 '20 at 19:41

Peter Cordes · Answer 1 · 2020-12-24T21:18:50.357

Use a debugger to look at EDX after the load, or use strace ./a.out to see the length you actually pass to the system call. It will be different (address vs. value loaded). Both ways are buggy in different ways that happen to produce large values, so the effect only happens to be the same. Debugging / tracing tools would have clearly shown you that NASM isn't "ignoring the dereference", though.

mov edx, msgLen is the address, right next to msg. It will be something like 0x804a005 if you link into a standard Linux non-PIE static executable with ld -melf_i386 -o foo foo.o.

mov edx, [msgLen] loads 4 bytes from that address (the width of EDX), where you only assembled the size into 1 byte in the data section. The 3 high bytes come from whatever comes next in the part of the file that's mapped to that memory page.

When I assemble with just nasm -felf32 foo.asm and link with ld -melf_i386, that happens to be 3 bytes of zeros, so I don't get any garbage printed from the dword-load version. (x86 is little-endian).

But it seems your non-zero bytes are debug info. I've found NASM's debug info is the opposite of helpful, sometimes making GDB's disassembly confused (layout reg / layout asm sometimes fail to disassemble a whole block after a label) so I don't use it.

But if I use nasm -felf32 -Fdwarf, then I do get 7173 from that dword load that goes past the end of the .data section. That's a different large number, so this is just wrong in a different way, not the same problem. 7173 is 0x1c05, so it corresponds to db 5, 0x1c, 0, 0. i.e. your calculated length of 5 is the low byte, but there's a 0x1c after it. yasm -gdwarf2 gives me 469762053 = 0x1c000005.

If you'd used db $-msg, 0,0,0 or dd $-msg, you could load a whole dword. (To load and zero-extend a byte into a dword register, use movzx edx, byte [mem])

`write()` behaviour with a large length

If you give it some very large length, write will go until it reaches an unreadable page, then return the length it actually wrote. (It doesn't check the whole buffer for readability before starting to copy_from_user. And it returns the number of bytes written if that's non-zero before encountering an unreadable page. You can only get -EFAULT if an unreadable page is encountered right away, not if you pass a huge length that includes some unmapped pages later.)

e.g. with mov edx, msgLen (the label address)

$ nasm -felf32 -Fdwarf foo.asm
$ ld -melf_i386 -o foo foo.o
$ strace -o foo.tr ./foo          # write trace to a file so it doesn't mix with terminal output
Zoom
 foo.asmmsgmsgLen__bss_start_edata_end.symtab.strtab.shstrtab.text.data.debug_aranges.debug_info.debug_abbrev.debug_lin! '  6& 9B_ 
                                                                                                                                        $ cat foo.tr
execve("./foo", ["./foo"], 0x7fffdf062e20 /* 53 vars */) = 0
write(1, "Zoom\n\5\34\0\0\0\2\0\0\0\0\0\4\0\0\0\0\0\0\220\4\10\35\0\0\0\0\0"..., 134520837) = 4096
exit(1)                                 = ?
+++ exited with 1 +++

134520837 is the length you passed, 0x804a005 (the address of msgLen). The system call writes 4096 bytes, 1 whole page, before getting to an unmapped page and stopping early.** It doesn't matter number you pass higher than that, because there's only 1 page before the end of the mapping. (And msg: is apparently right at the start of that page.)

On the terminal (where \0 prints as empty), you mostly just see the printable characters; pipe into hexdump -C if you want a better look at the binary data. It includes bits of metadata from the file, because the kernel's ELF program loader works by mmaping the file into memory (with a MAP_PRIVATE read-write no-exec mapping for that part). Use readelf -a and look at the ELF program headers: they tell the kernel which parts of the file to map into memory where, with what permissions.

Fun fact: If I redirect to /dev/null (strace ./foo > /dev/null), the kernel's write handler for that special device driver doesn't even check the buffer for permissions, so write() actually does return 134520837.

write(1, "Zoom\n\5\0\0[...]\0\0\220\4\10"..., 134520837) = 134520837

Using EQU properly

And yes, you should be using equ so you can use the actual length as an immediate. Having the assembler calculate it and then assemble that byte into the data section is less convenient. This prints exactly the right size, and is more efficient than using an absolute address to reference 1 constant byte.

   mov   edx, msg.len         ; mov r32, imm32
  ...

section .rodata
  msg:  db 'Zoom',0xA
  .len  equ $-msg

(Using a NASM . local label is unrelated to using equ; I'm showing that too because I like how it helps organize the global namespace.)

Also semi-related to how ld lays out sections into ELF segments for the program-loader: recent ld pads sections for 4k page alignment or something like that to avoid getting data mapped where it doesn't need to be. Especially out of executable pages.

Just a note. One comment the OP made was the garbage they saw. Among others things there appeared to be symbol table data for debugging. Try adding `-F stabs` to the assembly step and one would probably find the bytes after `msgLen` are debug info. — Michael Petch, Dec 24 '20 at 20:45
@MichaelPetch: Yup, thanks, updated. (And reorganized my answer into better sections.) — Peter Cordes, Dec 24 '20 at 21:19

Why does a write system call print a bunch of junk when you mov edx, [msgLen] from db $-msg - same as with the address?

1 Answers1

write() behaviour with a large length

Using EQU properly

`write()` behaviour with a large length