0

I have the following program written in x86_64 assembly:

BITS 64

section .data
    text db "Hello World!",10
    length db 13

section .text
    global _start

_start:
    mov rax, 1
    mov rdi, 1
    mov rsi, text
    mov rdx, [length]
    syscall

    mov rax, 1
    mov rdi, 1
    mov rsi, text
    mov rdx, [length]
    syscall

    mov rax, 60
    mov rdi, 0
    syscall

As expected, it prints "Hello World!" twice. Then, I add the following to my .data section:

text2 db "Another string",10
length2 db 15

Without changing anything else. I was expecting to get the same output, but now I get no output. Then, I change the second copy of the code that prints out the text to mov rsi, text and mov rdx, [length2]. Now, I get only the output Another string. I was expecting to see both strings printed. What is going on here, and how can I print two different strings?

  • 1
    The first thing that jumps out at me is `mov rdx, [length]`. So you are reading 64bits into rdx, but length is only defined to be 8 bits. I don't know what's ending up in rdx, but it may not be what you expect. – David Wohlferd Jan 27 '22 at 20:50
  • Yes, moving into `dl` instead of `rdx` fixed it! Thanks a lot! If you post this as an answer, I'll accept it, if not I'll post my own answer and accept in 2 days. – Christoffer Corfield Aakre Jan 27 '22 at 21:48
  • That's not the right fix. You should zero-extend the byte into RDX with `movzx edx, byte [len]`. ([Why can't I move directly a byte to a 64 bit register?](https://stackoverflow.com/q/22621340)). Or better use an EQU constant: [How does $ work in NASM, exactly?](https://stackoverflow.com/q/47494744) – Peter Cordes Jan 28 '22 at 03:06

1 Answers1

1

As I mentioned in the comments, the first thing about this code that jumped out at me is the mov rdx, [length] statement. Since you are using rdx, this instruction will read 8 bytes worth of data starting at length. However, you've only declared length as db, which means you're only defining 1 byte.

What's in the next 7 bytes? Hard to say. Sections are typically 'aligned' which means there's probably a few 'padding' bytes so _start will be on a 16 byte (or so) boundary. Since the code works when just using 1 string, it may be that the padding bytes are all zero.

But when you put the second string in place, suddenly the bytes after length aren't zero. They're whatever the ascii values for 'Another' are. Which means that instead of trying to output 13 bytes, you're trying to output several zillion. Oops, right?

Replacing the move with mov dl, [length] might seem like it should solve the problem, but there's a catch.

dl is the lowest byte of the rdx regsiter. So if before you do the move, rdx is 0, then everything works fine. But if rdx is 0xffffffffffffffff, then doing the mov dl would just set the lowest byte, which would set rdx to 0xffffffffffff0d.

Why does it work like that? Historical reasons. Back when registers were only 16bits long, being able to set the low byte with dl and the upper byte with dh seemed like a good idea. When the world moved to 32 bits, they didn't want to break existing code, so you could still do the dl/dh thing. Indeed, you could even set the entire lower 16 bits of the 32bit edx register by using the 'dx' register. But they didn't create a corresponding ability to set the upper 16bits.

64 bits mostly follows the same logic, with one important exception: If you try to set a 64 bit register by using a 32bit value, it automatically zeros out the upper bits. So mov edx, [length] will read 4 bytes (32 bits) into the register, and zeros out all the upper bits of rdx.

So I'd recommend that you either change length to use a 32bit value and use mov edx,[length] (which is what I'd probably do), or that you zero out rdx before you move the byte into dl. The most efficient way to zero all of rdx is xor edx, edx. This will zero out the upper bits because of what I explained before about setting a 32bit value, while being a (1 byte) shorter instruction than xor rdx, rdx.

David Wohlferd
  • 7,110
  • 2
  • 29
  • 56
  • 1
    The normal way to load a byte on 386 or later is `movzx edx, byte [len]`. Or better use an EQU constant. Canonicals: [Why can't I move directly a byte to a 64 bit register?](https://stackoverflow.com/q/22621340) / [How to load a single byte from address in assembly](https://stackoverflow.com/q/20727379) – Peter Cordes Jan 28 '22 at 03:07
  • Thanks @PeterCordes, I went with your approach. Will you post an answer? – Christoffer Corfield Aakre Jan 29 '22 at 00:13
  • Answers already exist for those; that makes it a duplicate since the bug here is loading a qword when you want to load a byte. Or if you mean for using `.len equ $-text` then doing `mov edx, text.len`, that's somewhat covered by the Q&A about `$` in NASM. – Peter Cordes Jan 29 '22 at 01:43
  • @PeterCordes gotcha, thanks for the help again – Christoffer Corfield Aakre Jan 29 '22 at 02:42
  • @David: Note that modern `ld` puts `.data` and `.text` in non-overlapping pages, so the initial static data from the file won't be sitting there in the same read-only exec private mapping as the actual code. (That's why the minimum executable size is significantly bigger these days, with default options). Older `ld` did used to put stuff next to each other in the file, but yeah good point that section alignment means there will be at least some zeros. (Or the whole rest of a page, if the kernel copies just the asked-for parts of .data into a zeroed page instead of a CoW mapping.) – Peter Cordes Jan 29 '22 at 02:51