You are making this over-complicated. You don't need a linear search or keys stored in your table; just range-check your value and then use it as a table index.
I think you're using MASM syntax, so I'll try to write this in MASM syntax, but I may have the syntax wrong. The actual instructions and logic should be correct, though.
section .rdata ; read-only data on Windows
CaseTable:
DWORD Process_A, Process_B ; function pointers
DWORD Process_C, Process_D
NumberOfEntries = ($ - CaseTable) / 4
; optional: define constants for 'A' and 'D' and use those in the code below
; so the keys / values are still all in one place in the source.
.text ; or .code or something.
; You were missing a section directive between your data and code.
; input: selector in EAX
dispatcher: ; you were also missing a label for your function
; movzx eax, al ; if your selector really was just a byte
sub eax, 'A' ; convert to idx. values below 'A' wrap to high unsigned
cmp eax, 'D' - 'A' ; NumberOfEntries
ja @Default ; unsigned compare rejects out-of-range high or low
call [CaseTable + eax*4]
; then fall through. Use jmp as a tail-call if you don't want that.
@Default:
ret
The trick to writing nice (and efficient) asm is to see through the problem to how simple it really is. You have to manually take advantage of any special-case situation, like your keys all being contiguous values. You are the compiler. :)
The function pointers should point to other functions, like
Process_A:
mov eax, [esp+4] ; return first arg unchanged
ret
Process_B:
mov eax, [esp+4]
add eax, eax ; return n * 2
ret
Process_C:
mov eax, [esp+4]
lea eax, [eax + eax*2] ; return n * 3
ret
Process_D:
mov eax, [esp+4]
shl eax, 2 ; return n * 4
ret
Obviously you wouldn't use a dispatch table for this, you'd just use imul
to multiply by an unknown number from 1 to 4. But this is just an example.
Compilers know a lot of cool tricks for optimizing switch
/case
statements. One of my favourites is when a lot of case labels do the same thing, clang and gcc will use an immediate bitmap to test for any of those cases in parallel:
void errhandler(enum errtype numError) {
switch (numError) {
case ERROR_01 : // intentional fall-through
case ERROR_07 : // intentional fall-through
case ERROR_0A : // intentional fall-through
case ERROR_10 : // intentional fall-through
case ERROR_15 : // intentional fall-through
case ERROR_16 : // intentional fall-through
//case ERROR_20 : // keep the range of cases smaller for simpler 32-bit code
fire_special_event();
break;
default:
// error codes that require no additional action
break;
}
}
Compiles to code like this (clang4.0.1 -O3 -m32
on the Godbolt compiler explorer)
errhandler(errtype): # @errhandler(errtype)
mov eax, dword ptr [esp + 4] # load first function arg
cmp eax, 22
ja .LBB0_2
mov ecx, 6358146 # 0x610482 is a bitmap of those error codes
bt ecx, eax
jae .LBB0_2 # aka JNC: jump if CF=0, i.e. the bit wasn't set, i.e. ((1<<eax) & ecx) was false
jmp fire_special_event() # TAILCALL
.LBB0_2:
ret
Unfortunately compilers aren't smart enough to use jcc as a conditional tail-call, so instead they conditionally jump over the jmp
:/
gcc chooses to use a mov eax, 1
/ shl
instead of using bt
, even when tuning for CPUs where bt
would be faster :/