x86 asm casetable implementation

Question

i'm having issues in understanding casetable in asm x86. My professor has already explained it in the slides using this example:

.data
CaseTable BYTE 'A'  ; lookup value
DWORD Process_A ; address of procedure
EntrySize = ($ - CaseTable)
BYTE 'B'
DWORD Process_B
BYTE 'C'
DWORD Process_C
BYTE 'D'
DWORD Process_D
NumberOfEntries = ($ - CaseTable) / EntrySize


mov ebx,OFFSET CaseTable    ; point EBX to the table
mov ecx,NumberOfEntries ; loop counter

L1: cmp al,[ebx]    ; match found?
jne L2  ; no: continue
call NEAR PTR [ebx + 1] ; yes: call the procedure
    ; +1: address after the byte        
jmp Default ; and exit the loop
L2: add ebx,EntrySize   ; point to next entry
loop L1 ; repeat until ECX = 0

Default:

however, this code is incomplete and i don't know how to use it to build my own caseetable. I would appreciate it if someone can give me one working case example implementing casetable using the above code, and show me specifically when does he call the procedures exactly in the code? I would appreciate it a lot. I will use the example to learn how to implement more and other cases. Thank you.

_"however, this code is incomplete and i don't know how to use it to build my own caseetable"_ - then it's time to go back to the textbook and/or class notes, and talk to the professor. We aren't going to do the work for you. — Jim Garrison, Dec 12 '17 at 18:28
case-tables are usually used for ranges, where simple arithmetic `(value-base_value)*ptr_size` points directly to correct jump address. Having additional byte of test value is not that common, then multi-branching code with hardcoded `cmp al,'A'` is better (performance wise and simpler to emit for compiler). — Ped7g, Dec 12 '17 at 23:11

score 1 · Answer 1 · answered Dec 13 '17 at 04:48

You are making this over-complicated. You don't need a linear search or keys stored in your table; just range-check your value and then use it as a table index.

I think you're using MASM syntax, so I'll try to write this in MASM syntax, but I may have the syntax wrong. The actual instructions and logic should be correct, though.

section .rdata    ; read-only data on Windows
  CaseTable:
    DWORD Process_A, Process_B   ; function pointers
    DWORD Process_C, Process_D
  NumberOfEntries = ($ - CaseTable) / 4
  ; optional: define constants for 'A' and 'D' and use those in the code below
  ; so the keys / values are still all in one place in the source.


.text   ; or .code or something.
        ; You were missing a section directive between your data and code.

; input: selector in EAX
dispatcher:        ; you were also missing a label for your function

    ; movzx  eax, al   ; if your selector really was just a byte
    sub   eax, 'A'         ; convert to idx. values below 'A' wrap to high unsigned
    cmp   eax, 'D' - 'A'   ; NumberOfEntries
    ja   @Default          ; unsigned compare rejects out-of-range high or low
    call  [CaseTable + eax*4]
    ; then fall through.  Use  jmp  as a tail-call if you don't want that.

 @Default:
    ret

The trick to writing nice (and efficient) asm is to see through the problem to how simple it really is. You have to manually take advantage of any special-case situation, like your keys all being contiguous values. You are the compiler. :)

The function pointers should point to other functions, like

Process_A:
    mov   eax, [esp+4]   ; return first arg unchanged
    ret

Process_B:
    mov   eax, [esp+4]
    add   eax, eax       ; return n * 2
    ret

Process_C:
    mov   eax, [esp+4]
    lea   eax, [eax + eax*2]   ; return n * 3
    ret

Process_D:
    mov   eax, [esp+4]
    shl   eax, 2       ; return n * 4
    ret

Obviously you wouldn't use a dispatch table for this, you'd just use imul to multiply by an unknown number from 1 to 4. But this is just an example.

Compilers know a lot of cool tricks for optimizing switch/case statements. One of my favourites is when a lot of case labels do the same thing, clang and gcc will use an immediate bitmap to test for any of those cases in parallel:

void errhandler(enum errtype numError) {
    switch (numError) {  
      case ERROR_01 :  // intentional fall-through
      case ERROR_07 :  // intentional fall-through
      case ERROR_0A :  // intentional fall-through
      case ERROR_10 :  // intentional fall-through
      case ERROR_15 :  // intentional fall-through
      case ERROR_16 :  // intentional fall-through
      //case ERROR_20 :  // keep the range of cases smaller for simpler 32-bit code
         fire_special_event();
         break;

      default:
        // error codes that require no additional action
        break;       
    }
}

Compiles to code like this (clang4.0.1 -O3 -m32 on the Godbolt compiler explorer)

errhandler(errtype):                 # @errhandler(errtype)
    mov     eax, dword ptr [esp + 4]    # load first function arg
    cmp     eax, 22
    ja      .LBB0_2
    mov     ecx, 6358146    # 0x610482 is a bitmap of those error codes
    bt      ecx, eax
    jae     .LBB0_2         # aka JNC: jump if CF=0, i.e. the bit wasn't set, i.e. ((1<<eax) & ecx) was false
    jmp     fire_special_event() # TAILCALL
.LBB0_2:
    ret

Unfortunately compilers aren't smart enough to use jcc as a conditional tail-call, so instead they conditionally jump over the jmp :/

gcc chooses to use a mov eax, 1 / shl instead of using bt, even when tuning for CPUs where bt would be faster :/

x86 asm casetable implementation

1 Answers1

Linked