2

When trying to write some routine in x86 assembly for a boot loader, I came across a bug where when a division error happened, the program would get stuck in an infinite loop. Through investigating, I found out that calling int 0 would go through the exception handler normally and then continue execution of the rest of the program. Writing my own exception handler for x86, the return address when a division error exception happened was the address of the instruction, meaning that it would just execute the division over and over looping forever. Is this normal behavior or a bug with Virtualbox/my cpu specifically?

org 0x7c00      ;put all label addresses at offset 0x7c00

xor ax, ax      ;set up all segment registers
mov ds, ax
mov ax, 0x9000
mov ss, ax
mov sp, 0x1000
mov ax, 0xB800  ;video text memory starts at this address
mov es, ax

mov ah, 0x00
mov al, 0x02
int 0x10        ;go into 80x25 monochrome text

mov [0x0000], word DivideException
mov [0x0002], word 0x0000

xor di, di
xor bx, bx

;int 0   ;this and the divide CX below will cause a division error exception

mov ax, 0
mov cx, 0 ;when exception is handled it prints out
div cx    ;"a divide by zero error happened 0000:7C2D 0000:7C2F
          ;the first address is the division instruction and the second one is 2 bytes after
          ;when int 0 is uncommented out then it will have the two same addresses
jmp $

ToHex:
push bp
mov bp, sp
push bx

mov ax, word [bp+6]
mov bx, word [bp+4]
add bx, 3
mov cx, 16

.Loop:
xor dx, dx
div cx
add dx, 48
cmp dx, 58
jb .Skip
add dx, 7
.Skip:
mov byte [bx], dl
dec bx
cmp ax, 0
jne .Loop

.Ret:
pop bx
mov sp, bp
pop bp
ret

PrintStr:
push bp
mov bp, sp
push bx

mov bx, word [bp+6]
mov ah, byte [bx]
mov bx, word [bp+4]

.PrintLoop:
mov al, byte [bx]

mov word [es:di], ax
inc di
inc di
inc bx
cmp byte [bx], 0x00
jne .PrintLoop

pop bx
mov sp, bp
pop bp
ret

DivideException:
push bp
mov bp, sp
push bx

push word ColorAttributes1
push word String3
call PrintStr
add sp, 4

push word [bp+4]
push word String1
call ToHex
add sp, 4

push word [bp+2]
push word String2
call ToHex
add sp, 4

push word ColorAttributes1
push word String1
call PrintStr

push ds
mov ds, word [bp+4]
mov bx, word [bp+2]

cmp byte [ds:bx], 0xF7  ;checks if theres a 0xF7 byte at the return address
jne .DontAdd            ;for some reason the return address when calling int 0
add word [bp+2], 2      ;directly is the address after the instruction while
.DontAdd:               ;causing a divide error exception through divsion will
pop ds                  ;put the return address at the division leading to an
                        ;infinite loop
push word [bp+4]
push word String1
call ToHex
add sp, 4

push word [bp+2]
push word String2
call ToHex
add sp, 4

push word ColorAttributes1
push word String1
call PrintStr

add sp, 4

pop bx
mov sp, bp
pop bp
iret



String1: db "0000:";, 0x00
String2: db "0000 ", 0x00
String3: db "a divide by zero error happened ", 0x00
ColorAttributes1: db 0x0F ; first nibble is backround color
                         ;second nibble is foreground


times 2048-2- ($-$$) db 0  ;fills the rest with 0's until 510 bytes
dw 0xAA55               ;magic boot sector number
Guber
  • 89
  • 6
  • Please show some code sample. – Victor Feb 10 '22 at 19:16
  • sorry if the code is extremely messy, if I didn't overwrite the exception handler then it would go into an infinite loop – Guber Feb 10 '22 at 19:26
  • @Victor: Not really necessary in this case; the question was already specific enough. (And the thing being asked about is in fact the crux of the matter, not a symptom of some bug in some code not shown.) – Peter Cordes Feb 10 '22 at 20:09

1 Answers1

6

Original 8086/8088 does push the address of the following instruction for #DE exceptions.
But all other x86 CPUs push the start address of the faulting div/idiv instruction. (At least starting from 386; but 286 is very likely the same as 386.)

That's normal for x86 in general: faulting instructions push the address of the instruction that faulted. x86 machine code can't be reliably/unambiguously decoded backwards, so the design intent is that the exception handler can examine the situation and potentially repair it, and re-run the faulting instruction.

See Intel x86 - Interrupt Service Routine responsibility which breaks down the differences between Faults, Traps, and Aborts, and even specifically mentions the difference between int 0 and a faulting div.

That's useful for #PF page faults, although not as realistic for things like FP and integer arithmetic exceptions. But if not repair, then at least report the actual instruction that faulted. e.g. idiv dword [fs: rdi + 0xf1f7f1f7] would be ambiguous to disassemble backwards. The f7 f1 bytes in the disp32 are the encoding for div ecx. You also wouldn't know if a jump had jumped straight to the idiv opcode after the FS prefix. So it's definitely useful for debugging and possibly other purposes to have the actual address of the start of the faulting instruction, not its end.

int 0 (if allowed by the IDT if you're not in real mode) pushes the CS:[ER]IP of the following instruction, of course, since it's not something that could re-run without faulting after the situation is repaired. int in general is intended to work kind of like call in terms of returning to the instruction after.


The 8086 behaviour appears to have been an intentional decision to simplify the hardware at the expense of worse behaviour. It has no limit on max instruction length, and avoids remembering the start of an instruction at all, anywhere inside the CPU (Ken Shirriff quotes an Intel patent in an answer on Interrupts, Instruction Pointer, and Instruction Queue in 8086).

If cs rep movsb is interrupted by an external interrupt, the interrupt-return address is before the final prefix, not the actual instruction start. (i.e. it would resume as rep movsb without the cs prefix, which is a disaster if you put the prefixes in that order. This is the biggest "worse behaviour"; you can work around it by putting rep cs movsb inside a loop.) Since 8086 doesn't have any kind of page-faults or configurable segment-limits, it can't take a synchronous exception during rep cs movsb or other rep-string instructions, only async external interrupts.

See Why do call and jump instruction use a displacement relative to the next instruction, not current? for more guesswork about 8086 design decisions.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • So is the bios failing to address a divide error causing it to go into an infinite loop? – Guber Feb 10 '22 at 20:44
  • 1
    @Guber: I wouldn't call it any kind of failure on the BIOS's part, but yeah if it happens to set up an IVT entry pointing at an `iret`, that would make an infinite loop. Even that isn't guaranteed by anything, AFAIK; you could just as easily have execution go off into the weeds and create a mess. There's definitely no reason to expect execution to continue with the instruction *after* a faulting `div`, if you didn't install your own interrupt handler that specifically steps past it (perhaps after zeroing EDX:EAX if the dividend wasn't already zero.) – Peter Cordes Feb 10 '22 at 21:33
  • 1
    I rewrote the exception handler to say that a divide error has happened and then to print out a register dump. I think this will be a very useful tool when I try to write more boot loader programs. – Guber Feb 11 '22 at 18:05
  • 1
    "I don't know what 286 did" My 1989 book: "80386 Programmer's Reference Manual" only mentions next 4 differences between the real address modes of 286 vs 386: Bus lock on whole of memory, first instruction at 00FFFFF0h, possibly different initialization of general registers, and MSW initialized at FFF0h. Logically I would deduce the 286 behaves just like the 386 for #DE. – Sep Roland Dec 25 '22 at 23:36