1

I'm trying to define a byte in Assembly language inside my .text section. I know data should go to the .data section but I was wondering why it gives me a segmentation fault when I do it. If I define the byte inside .data, it doesn't give me any errors, unlike .text. I am using a Linux machine running Mint 19.1 and using NASM + LD to compile and link the executable.

This runs without segmentation faults:

global _start
section .data
db 0x41
section .text
_start:
    mov rax, 60    ; Exit(0) syscall
    xor rdi, rdi
    syscall

This gives me a segfault:

global _start
section .text
_start:
    db 0x41
    mov rax, 60     ; Exit(0) syscall
    xor rdi, rdi
    syscall

I'm using the following script to compile and link it:

nasm -felf64 main.s -o main.o
ld main.o -o main

I expect the program to work without any segmentation faults, but it doesn't when I use DB inside .text. I suspect that .text is readonly and that may be the reason of this problem, am I correct? Can someone explain to me why my second code example segfaults?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Lucas
  • 25
  • 4
  • 1
    Because you put your byte into the execution path so the CPU tried to decode it as instruction which faulted. Use `objdump -d` to see what it means. Put it after your code if you insist on it being in `.text`. Or move the `_start` label to the beginning of your actual code. Yes, `.text` is read-only but you did not attempt to write to it so that's fine. – Jester Apr 11 '19 at 23:50
  • 2
    Since you're using NASM, it optimizes `mov rax,60` into `mov eax,60`, so the instruction doesn't have the REX prefix you'd expect from the source. (That would have given you a SIGILL from 2 REX prefixes for one instruction.) Your manually-encoded REX prefix for `mov` does change it into a `mov` to R8D instead of EAX, though: `41 b8 3c 00 00 00 mov r8d,0x3c`. Check with `objdump -drwC -Mintel`. – Peter Cordes Apr 11 '19 at 23:57
  • 1
    Yeah and loading `r8d` instead of `eax` will of course remove the loading of the function number for the system call so it will no longer be an `exit` and it will return to continue execution except there is no more valid code to execute. Segfault. – Jester Apr 12 '19 at 00:03

1 Answers1

6

If you tell the assembler to assemble arbitrary bytes somewhere, it will. db is a pseudo-instruction that emits bytes, so mov eax, 60 and db 0xb8, 0x3c, 0, 0, 0 are exactly equivalent as far as NASM is concerned. Either one will emit those 5 bytes into the output at the current position.

If you don't want your data decoded as (part of) instructions, don't put it where it will be reached by execution.


Since you're using NASM1, it optimizes mov rax,60 into mov eax,60, so the instruction doesn't have the REX prefix you'd expect from the source.

Your manually-encoded REX prefix for mov changes it into a mov to R8D instead of EAX:
41 b8 3c 00 00 00 mov r8d,0x3c

(I checked with objdump -drwC -Mintel instead of looking up which bit is which in the REX prefix. I only remember that REX.W is 0x48. But 0x41 is a REX.B prefix in x86-64).

So instead of making a sys_exit system call, your code runs syscall with EAX=0, which is __NR_read. (The Linux kernel zeros all the registers other than RSP before process startup, and in a statically-linked executable, _start is the true entry point with no dynamic linker code running first. So RAX is still zero).

$ strace ./rex 
execve("./rex", ["./rex"], 0x7fffbbadad60 /* 54 vars */) = 0
read(0, NULL, 0)                        = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} ---
+++ killed by SIGSEGV (core dumped) +++

And then execution falls through into whatever is after syscall, which in this case is 00 00 bytes that decode as add [rax], al, and thus segfault. You would have seen this if you'd run your code inside GDB.


Footnote 1: If you'd used YASM which doesn't optimize to 32-bit operand size:

Intel's manuals say that it's illegal to have 2 REX prefixes on one instruction. I expected an illegal-instruction fault (#UD machine exception -> kernel delivers SIGILL), but my Skylake CPU ignores the first REX prefix and decodes it as mov rax, sign_extended_imm32.

Single-stepping, it's treated as one long instructions, so I guess Skylake chooses to handle it like other cases of multiple prefixes, where only the last one of a type has an effect. (But remember this is not future-proof, other x86 CPUs could handle it differently.)


Related / same bug in other situations:

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • When you wrote "YASM doesn't optimize", I assumed you meant that it encoded `mov rax, imm` as `48 b8 imm64`. But then you said it is executed as `mov rax, imm32`. For a minute, I thought you had made a mistake. :-) But then I guessed that YASM does a *partial* optimization and encodes it as `48 c7 c0 imm32`. – prl Apr 12 '19 at 02:22
  • @prl: oh yes, worded more explicitly. It always chooses the shortest encoding for the instruction you ask for (based on the numeric value), but it *won't* change the operand-size to something you didn't ask for. In Intel syntax, `mov r64, sign_extended_imm32` and `mov r64, imm64` are both 64-bit operand-size. In fact, in YASM `mov rdi, symbol` chooses the sign_extended_imm32 encoding, so it won't link into a PIE executable. (NASM chooses `mov r64,imm64` with a 64-bit absolute relocation for runtime fixup). Yet another reason to never write `mov r64, symbol`, always `mov r32` or `LEA [rel]`. – Peter Cordes Apr 12 '19 at 02:27
  • What happens if there is a REX prefix followed by a non-REX prefix? Like segment overrides, `repe`/`repne`, `lock`, `asize`/`osize`. – ecm Sep 06 '21 at 21:04
  • 2
    @ecm: Again, the manual says REX must be the last prefix. On SKL, `41 66 b8 ff ff ff ff` ignores the REX entirely, and writes AX=-1, with the single-step stopping after the imm16 of `mov ax,-1`. It doesn't write R8W despite the REX.B=1, like it would with prefixes in a valid order. I didn't test extensively, and of course there's zero guarantee that any future CPU, or existing other CPUs, will behave like Skylake. If you want to dig deeper, a separate Q&A about behaviour of real CPUs with invalid prefix combos would be a good place to post results. – Peter Cordes Sep 06 '21 at 21:42