0

I am writing a project in x86-64 Assembly and I want to write an efficient switch statement using a branch lookup table. However, I get position independence errors.

I'll start with my code. The assembly was taken from this answer.

Assembly:

global mySwitch

section .text

mySwitch:
    jmp [.jump_table + 2 * edi]
    .jump_table: dw .one, .two
.one:
    mov eax, 123
    ret
.two:
    mov eax, 321
    ret

C:

#include <stdio.h>

int mySwitch(int arg);

int main() {
    printf("%d\n", mySwitch(1));
}

I am trying to compile it with the following commands:

nasm -f elf64 -w+all -w+error switch.asm -o switch_asm.o
gcc -c -Wall -Wextra -std=c17 -O2 switch.c -o switch_c.o
gcc switch_asm.o switch_c.o -o switch

but the third one returns the following error:

/usr/bin/ld: switch_asm.o: relocation R_X86_64_32 against `.text' can not be used when making a PIE object; recompile with -fPIE
collect2: error: ld returned 1 exit status
make: *** [makefile:4: all] Error 1

Using the -fPIE switch is against the rules of the assignment (and also does not help), and I do not know what I am missing (previously, the problem was caused by a missing rel or wrt ..plt).

Update: Changing my assembly code slightly to 64-bit addresses and a lea instruction like so:

    lea rax, [rel .jump_table]
    jmp [rax + rdi * 8]
    .jump_table: dq .one, .two

compiles, and works as expected. I'd love to post this is an answer as to how to write a switch statement in asm but I cannot because this question is closed due to being a duplicate despite not being a duplicate so oh well.

Maurycyt
  • 676
  • 3
  • 19
  • 1
    How do you expect to fit 32-bit (?) addresses into 16-bit `dw` (data words)? The Q&A you're copying from is 16-bit code, and you only changed the addressing-mode to be 32-bit instead of 64-bit when porting to x86-64, which is weird. Even if you did link a non-PIE executable with `gcc -no-pie`, you'd then run into `R_X86_64_16` problems. For the actual question you asked about, duplicate of [32-bit absolute addresses no longer allowed in x86-64 Linux?](https://stackoverflow.com/q/43367427) – Peter Cordes Apr 20 '22 at 09:50
  • I do not expect to. That is why I am asking the question: I do not know what to do to make it work. I did not know that the post I linked was specific to 16-bit code. I still do not know how to make it work. Changing dw to dd or dq does not help. – Maurycyt Apr 20 '22 at 09:59
  • 1
    Do that *and* link with `gcc -no-pie`, you have to fix both parts, that's why I mentioned them both, and said the `-no-pie` thing was the solution to the first error, the on you mention in the title. See also [How to remove "noise" from GCC/clang assembly output?](https://stackoverflow.com/q/38552116) for how to look at compiler output for examples, although that will use GAS syntax instead of NASM. But GAS .intel_syntax is similar except for directives like `.quad` instead of `dq`. – Peter Cordes Apr 20 '22 at 10:05
  • Linking with extra flags is not an option for me. – Maurycyt Apr 20 '22 at 10:06
  • 1
    Ok, that's weird. Non-PIE executables are a good way to get stuff working before you understand all the x86-64 details of how to use RIP-relative addressing, but yeah the code you edited into the question is correct if you don't need it to actually be position-independent. (Runtime fixups for 64-bit absolute addresses). Although normally you'd put your jump table in `.rodata`, not inline with the code where it can create a performance problem (since the default path for branch prediction of an indirect jump is fall-through, into that `dq`. And also it'll get pulled into L1d as well as L1I) – Peter Cordes Apr 20 '22 at 10:19
  • 1
    As for duplicates about x86-64 jump tables, turns out [How do you create jump tables in x86/x64?](https://stackoverflow.com/q/58364998) already shows one using RIP-relative LEA. It's MASM syntax, not NASM, though. – Peter Cordes Apr 20 '22 at 10:20
  • Yea I mean sadly we have strictly defined compilation flags and I cannot use others than the ones I posted (with some additional warning flags). What you said helped a bit but a friend of mine had to also chip in with some obscure knowledge of his own and now I managed to resolve the problem. So, thanks! – Maurycyt Apr 20 '22 at 10:20
  • 1
    So you want this for production use, but you're hand-writing in assembly while you barely know it? I assumed this was just for learning purposes. Or is it a school assignment? If so, I'd recommend that your professor should make the fixed build commands include `-fno-pie -no-pie` to make x86-64 asm somewhat more beginner-friendly until the details / limitations of RIP-relative addressing have been covered. (feel free to link them my comment here.) – Peter Cordes Apr 20 '22 at 10:26
  • 1
    Anyway, [GCC Jump Table initialization code generating movsxd and add?](https://stackoverflow.com/q/52190313) shows how GCC makes `switch` tables that are *actually* position-independent, using 32-bit relative offsets to be added to RIP, instead of loading 64-bit absolute pointers directly. – Peter Cordes Apr 20 '22 at 10:27
  • Yea, this is for a university assignment. Unfortunately I can tell you with absolute confidence that the professor will disregard suggestions about adding additional flags. Assembly during the course is a bit rushed because its not the main focus of the course. Thanks for the help. I will tak a look at the link about GCC Jump tables at my earliest convenience. – Maurycyt Apr 20 '22 at 10:30
  • 1
    BTW, since you mentioned efficiency: If there are really only two cases, a conditional branch would be a more efficient way to implement this `switch`. Or much better, since the only difference in the two code paths is the return value, don't branch at all, just index an array of data, or ALU select `mov eax, 123` / `mov edx, 321` / `test edi,edi` / `cmovnz eax, edi`. Or compute it as `imul eax, edi, 321-123` / `add eax, 123` or similar. (Good throughput, only 2 uops, but imul has 3-cycle latency on mainstream CPUs.) – Peter Cordes Apr 20 '22 at 10:33
  • 1
    Or `sub edi, 1` (create a 0 or 0xFFFFFFFF) / `and edi, 123-321` / `lea eax, [rdi+321]` works, I think. Only 3 instructions vs. 4 for the cmov version, but does still need two 32-bit immediates in the machine code. – Peter Cordes Apr 20 '22 at 10:36
  • Haha no the original problem has 8 branches indexed 0 through 7. I just needed a minimal reproducible example. – Maurycyt Apr 20 '22 at 10:44

0 Answers0