0

I'm currently learning x86/x64 asm and I wanted to try to make a jump table, but I can't figure out what I'm doing wrong.

The concept itself is not new to me, I just can't figure out why it doesn't work. I saw the usage of [ ] in a few times when I was researching this, but I'm not sure if it is the right way to do it.

.data
var qword 10

.code
main proc

    mov rax, var
    jmp [table]
back:
    ret

table:
qword subroutine, subroutine2

subroutine:

    mul var
    jmp back

subroutine2:

    mul var
    jmp back

main endp
end

When I step through the code, and it skips the jmp instruction and on ret gives an access violation reading location 0x00000000 error

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
NMITIMEN
  • 75
  • 1
  • 7
  • 2
    I'd recommend putting your jump table in the `.rdata` section, not mixed with code. IDK MASM syntax very well, maybe you need a `qword ptr` to distinguish it from `jmp table`? Check disassembly to see whether you got a memory-indirect jmp or a rel32. MASM syntax is stupid about square brackets sometimes. An array of qword symbol addresses is what you want, though. (Or for position-independent code, just an array of offsets relative to some base, and maybe make the offsets 16 or 32-bit zero-extended with movzx.) – Peter Cordes Oct 13 '19 at 15:43
  • BTW, one-operand `mul` is usually unnecessary; I'd use `imul rax, rax` since you already take an input in rax. – Peter Cordes Oct 13 '19 at 15:45
  • 1
    @PeterCordes Yes, `jmp [table]` and `jmp table` are the same thing. Because of that they simply started to execute the jump table as code. It would be correct to do as you suggest `jmp qword ptr [table]` – Michael Petch Oct 13 '19 at 17:40
  • Thought so. As the linked duplicate says: "*square brackets [] mean pretty much nothing to MASM when you're just using symbols.*" Added a section about jumps to the answer to make it a better duplicate target for this. – Peter Cordes Oct 13 '19 at 17:54
  • Err, maybe not an exact duplicate. As Ross commented on his answer there, putting the table in a data section might have worked. MASM has so much ridiculous magic compared to simpler assemblers like NASM. – Peter Cordes Oct 13 '19 at 18:39
  • I got a bit further now. The only problem now it that I need to get the base address of the table to be able to add the offset to it. Using `add rax, table` results to a `constant value too large` error. Indirect jumping to a subroutine works with a fixed value. – NMITIMEN Oct 13 '19 at 19:38
  • 1
    @NMITIMEN - 64 bit immediates can only be used with mov instruction. Perhaps | mov rdx, offset table | add rax, rdx | . – rcgldr Oct 13 '19 at 21:32
  • @DavidWohlferd - I don't think LEA can use 64 bit immediates. – rcgldr Oct 14 '19 at 00:43
  • @rcgldr: yes, but you *don't want* a 64-bit absolute immediate, you want a 32-bit RIP-relative LEA to get the table address in a register. [How to load address of function or label into register in GNU Assembler](//stackoverflow.com/q/57212012) compares different ways of doing it. (Using GAS and NASM syntax, but the important points in the answer are about which machine-code encoding you want.) Also [Mach-O 64-bit format does not support 32-bit absolute addresses. NASM Accessing Array](//stackoverflow.com/q/47300844) for specifically indexing arrays. – Peter Cordes Oct 14 '19 at 01:16
  • @rcgldr: This is a MASM question. But the best machine-code way to get a static address into a register doesn't depend on the assembler, just on whether you can use a 32-bit absolute address or not. If not, then `lea r64, [symbol]` is the best, using a RIP-relative addressing mode. Using whatever syntax your assembler needs to make that happen. – Peter Cordes Oct 14 '19 at 01:20
  • @rcgldr: I don't know the MASM syntax for it; maybe look at compiler output. But you definitely want a RIP-relative LEA if a `jmp [disp32 + reg*8]` won't work, not a 64-bit absolute immediate. – Peter Cordes Oct 14 '19 at 01:50
  • A quick way to see a pattern is to compile a C `switch` statement with a few `case`es 0,1,2,... Use the -S option to see the resulting assembly code. It's always a good idea to leverage the enormous tacit knowledge encapsulated in compilers when you can. – Gene Oct 14 '19 at 03:33

1 Answers1

4

Note that as commented, MASM is ignores the []. Instead MASM goes by the type for a label. In this case, the problem is that the : after table (table:) makes label of type "code" that is normally used as a branch or call target, so jmp [table] or jmp table, branches to table as if it were code.

Removing the : and putting the qword (or dq could be used) on the same line, changes table to type qword, so jmp [table] or jmp table, loads the qword address at table into RIP and does the branch as wanted.

table   qword   subroutine, subroutine2

However, if you want to index into table, you'll either need to use a register to hold the offset of the table (like lea r9,table), or in the case of Visual Studio, go to project / properties / linker / system / enable large addresses : no (sets linker parameter /LARGEADDRESSAWARE:NO). I posted examples for both cases below.


This example works with ML64.EXE (MASM) from Visual Studio. The table can be in code or data section. If table is first line in data, lea generates {4C 8D 0D 79 E5 00 00}, if table is first line in code, lea geneates {4C 8D 0D E1 FF FF FF}. I don't know which is better for performance. It would seem that if the data cache is not being fully utilized, then it would keep a copy of the table the data cache.

        .data
tbl     dq      fun1, fun2, fun3            ;table
        .code

main    proc
        lea     r9,tbl
        mov     rax,0
main0:  jmp     qword ptr [r9+rax*8]
main1:: inc     rax
        cmp     rax,3
        jb      main0
        xor     eax,eax
        ret
main    endp

fun1    proc
        mov     rdx,1
        jmp     main1
fun1    endp

fun2    proc
        mov     rdx,2
        jmp     main1
fun2    endp

fun3    proc
        mov     rdx,3
        jmp     main1
fun3    endp

        end

With Visual Studio linker parameter /LARGEADDRESSAWARE:NO, there no need to use a second register. The table can be in data or code section. If table is first line in data, jmp generates {FF 24 C5 00 00 3D 00}, if table is first line in code, jmp geneates {FF 24 C5 80 1A 2D 01}. I don't know which is better for performance. It would seem that if the data cache is not being fully utilized, then it would keep a copy of the table the data cache.

        .data
tbl     dq      fun1, fun2, fun3            ;table
        .code
main    proc
        mov     rax,0
main0:  jmp     qword ptr [tbl+rax*8]
main1:: inc     rax
        cmp     rax,3
        jb      main0
        xor     eax,eax
        ret
main    endp

fun1    proc
        mov     rdx,1
        jmp     main1
fun1    endp

fun2    proc
        mov     rdx,2
        jmp     main1
fun2    endp

fun3    proc
        mov     rdx,3
        jmp     main1
fun3    endp
rcgldr
  • 27,407
  • 3
  • 36
  • 61
  • Never use `mov r9,offset tbl` in 64-bit code. RIP relative LEA is better if you can't use a 32-bit destination register. Also, don't forget to explain what's wrong with the code in the question; why it *doesn't* work for using a function pointer. – Peter Cordes Oct 14 '19 at 01:23
  • @DavidWohlferd - I missed an edit to make it `lea r9,[tb]`. This example doesn't modify r9, so I leave the offset to the table in r9. – rcgldr Oct 14 '19 at 02:05
  • @rcgldr: `jmp [mem]` can use any addressing mode you want, including `[r9 + rax*8]`. But if you're hoping for a calculated jump with register-indirect and free calculation like `jmp r9 + rax`, there's no such instruction. – Peter Cordes Oct 14 '19 at 02:12
  • Where you put the table matters for performance, not correctness. Mixing it with code would waste L1I and iTLB space on data, and waste L1d + dTLB space on the surrounding code, for the line / page that contains both code and data. Either way it's a 7-bit `lea r64, [RIP + rel32]`, with a 2's complement sign-extended rel32 (little-endian of course). One of your rel32 values is positive, the other negative. No big deal. – Peter Cordes Oct 14 '19 at 02:14
  • @PeterCordes - I updated my answer to avoid the lea and just use a jump indirect, Requires a linker build switch as noted in my answer. – rcgldr Oct 14 '19 at 02:40
  • Oh. Yeah generally comments should be removed once a post incorporates the suggestion they pointed out. (You *can* even flag obsolete comments, but not usually worth the mods time for low-traffic obscure posts) Or I can delete some of mine, and @DavidWohlferd can delete his. I might keep my comment that points out not to use `mov` 64-bit absolute, though. – Peter Cordes Oct 14 '19 at 03:04
  • 1
    @PeterCordes - I updated my answer to explain what was wrong with the question's code. I moved table to the first line of data to correspond to the generated code I showed above. MASM allows jmp [table] {FF 25 B0 7C 00 00} without the linker switch, but jmp [table+rax*8] {FF 24 C5 00 00 3D 00} requires the linker switch. – rcgldr Oct 14 '19 at 03:35
  • Yes, RIP-relative addressing is possible when the addr mode has no base or index register. – Peter Cordes Oct 14 '19 at 03:54