You actually just want to call ArithmeticError
with AX= one or two bits right? The returning-from-JO is just an X-Y problem in your implementation.
There are a few strategies you can use. On simple one is to get FLAGS into AX and test the two bits there. (https://en.wikipedia.org/wiki/FLAGS_register has the layout). We can leave it to ArithmeticError
to sort out the bit positions in the rare case we actually have an error, instead of doing extra work every time.
pushf
pop ax ; OF = 0x0800 CF=0x0001
and ax, 0x0801 ; zero the other bits, leaving only OF and CF
jnz ArithmeticError ; tail-call / jump if either were set
back:
ret
ArithmeticError:
; at the start of ArithmeticError, if you want to shuffle 0x0801 to 0x03
shr ah, 2 ; 0x8 -> 0x2
or al, ah ; AL = 0 0 0 0 ' 0 0 OF CF. (AH can be non-zero)
... do your printing or whatever
Jumping to ArithmeticError
lets it use our return address, not coming back to us. (This is an optimized tailcall.) Otherwise in the common case, jnz
falls through so we reach our own ret
. It comes out the same as doing this, because call foo
/ret
is equivalent to jmp foo
.
...
jz no_error ; equivalent without optimized conditional tailcall
call ArithmeticError
no_error:
ret
inc
leaves CF unmodified in x86. I'm assuming that's also true in msx88. That could be useful if we ever wanted to use inc ax
; inc ax
to increment by two.
But mov
doesn't touch FLAGS at all, so we can just mov ax,2
if we're more interesting in simplicity than code-size1.
mov al, 0
jno no_overflow
mov al, 2 ; AL = 2 if OF was set
no_overflow:
adc al, 0 ; AL += 0 + CF, setting the low bit or leaving unmodified
jnz ArithmeticError ; use FLAGS set according to AL, by ADC
ret
Footnote 1: inc ax
is a 1-byte instruction in 16-bit mode, vs. 3 for mov ax, 0
or mov ax,2
. When optimizing for ancient 8088 CPUs, you would use inc ax
twice (2 bytes total) instead of add ax,2
(3 bytes) because code-fetch was the primary bottleneck.
But to save bytes, we used al
instead of ax
. You can zero-extend into AX if you want in the ArithmeticError
case, or change the instructions to use AX directly.
If we have setcc
(386 and later), we can turn a FLAGS condition into an integer 0 / 1 in a register. Without it, you might start with 8086 lahf
(Load AH from FLAGS) to get CF into the bottom of AH, before doing something with OF (which is unfortunately outside the low 8 bits of FLAGS). lahf
/and ah,1
emulates setc ah
, except it also writes FLAGS so you'd have to wait until after branching on OF.
; kinda clunky
setc ah ; AH = CF ; optionally use CL or DL to avoid partial-register stalls or false dependencies
seto al ; AL = OF
add al, ah ; AL = 1 or 2 if either or both bits were set
; we'll have to decode using AH to recover which one
jnz ArithmeticError
ret
; assuming that no-error is vastly more common than either carry or overflow
; we made the branching as cheap as possible, leaving more decode work for this
ArithmeticError:
sub al, ah ; undo the add, back to AH=CF, AL=OF
shl ah, 1
add al, ah ; CF + 2*OF
; if I'd used separate regs like dl and al, could have used lea eax, [eax + 2*edx]
Somewhat efficient assuming errors are rare: two not-taken branches
jo overflow
mov al, 0 ; only zero AL if mov al, 2 isn't going to run
jc carry
ret
overflow:
mov al, 2
jnc nocarry ; alternative: ADC al,0 as in the earlier version
carry:
inc ax ; like OR al,1 since AL=0 or 2 before this
nocarry:
jmp ArithmeticError ; could be a call, if you want to put stuff after
This ends up checking CF in two places, but that's because we want to keep the fast path as minimal as possible, just two not-taken branches. (And a mov al,0
).
You could take this even further and do more work sorting things out in the overflow:
and carry:
blocks, so the fast path is just jc
/ jo
. So the carry
block can't assume any register state was already set up. You might just use both halves of AH:AL separately. If errors are rare, the extra code required will still run a negligible amount of times.
But if you're replicating this for every operation that you want error checking for, it makes sense to tighten it up.
Putting the mov al,0
between the two branches spaces them out, which may help branch predictors in some CPUs. Maybe not important if they're usually both not-taken. But may help on in-order pipelines, especially P5 Pentium which can pair jcc
in the V pipe (See Agner Fog's microarch guide and instruction tables). (ret
is not pairable, but near call
is.) Of course modern x86 does out-of-order exec, but producing correct predictions for dense branches can still be a problem for the front-end.
We're not depending on an a conditional-tailcall, so we can use ArithmeticError
if we want to inline this block into functions instead of having a ret
after the jc
. Also, so even before 386 jcc rel16
, we aren't limited to a [-128,+127] branch displacement to reach ArithmeticError
with jcc rel8
. JMP does have a 16-bit displacement version even in 8086.