How to display the interrupt vector table in assembly 8086?

Question

I want to display the interrupt vector table in my code in assembly 8086, then I want it to stop at the first free vector.
The question is : view the table of interrupt vectors and determine the first free vector.

I know that the address of the first vector of the table is 0000h, so I tried to set the cs segment register to it and I couldn't do it? I tried: mov cs,0 and mov bx,0 mov cs,bx but none worked.
Then I tried call cs:offset 0000h and again it didn't work. So how can I do it ?

BOCHS's built-in debugger should be able to display / view it. But it sounds like you want to write code to *search* it. Normally for data accesses, you want to use DS or ES as the segment. Setting CS and then using loads/stores like `cmp word ptr cs:[bx], 0` would be inconvenient, because you'd need code in that segment for CS:IP to still work. — Peter Cordes, Jun 18 '21 at 13:17
`XOR SI,SI`,`MOV DS,SI`, `LES AX,[SI]`, check if ES:AX is free, otherwise `ADD SI,4` and repeat from `LES`. — vitsoft, Jun 18 '21 at 14:25
What does it mean for an interrupt vector to be "free"? Do you mean you want to test whether the interrupt handler address is set to 0000:0000? — Nate Eldredge, Jun 20 '21 at 02:45

Sep Roland · Answer 1 · 2021-06-19T23:14:21.870

1

the question is : view the table of interrupt vectors and determine the first table free vector

This is a twofold question.

To display the numbers involved you can read Displaying numbers with DOS.

To find the first slot in the Interrupt Vector Table (IVT) that contains 0:0 you can use below code:

  xor si, si    ; Set DS:SI to the start of the IVT
  mov ds, si
  cld           ; Have LODSW increment (by 2) the SI register
Again:
  lodsw         ; The offset part of a vector
  mov dx, ax
  lodsw         ; The segment part of a vector
  or  ax, dx
  jz  Found     ; If both parts are zero, then their OR will set ZF=1 (Zero Flag)
  cmp si, 1024
  jb  Again     ; Repeat until the end of the IVT which is at address 1024
NotFound:
  ...
  jmp ..
Found:
  sub si, 4     ; -> DS:SI is the address of the first slot containing 0:0

An IVT-slot that contains 0:0 is certainly free, but whether that is the first slot that is free is not necessarily true. Read more on this in @Margaret Bloom's answer in Find a free interrupt slot.

[EDIT]

A somewhat more elegant solution that is also shorter but slightly slower, and that clobbers one register less (the DX register is not used):

  xor si, si    ; Set DS:SI to the start of the IVT
  mov ds, si
Again:
  mov ax, [si]  ; The offset part of a vector
  or  ax, [si+2]; The segment part of a vector
  jz  Found     ; If both parts are zero, then their OR will set ZF=1 (Zero Flag)
  add si, 4
  cmp si, 1024
  jb  Again     ; Repeat until the end of the IVT which is at address 1024
NotFound:
  ...
  jmp ..
Found:
                ; -> DS:SI is the address of the first slot containing 0:0

And this is the idea that @Peter Cordes presented in a comment. It's 1 clock slower in the loop, but we can shave off 2 bytes if we replace the add si, 2 and sub si, 2 instructions by inc si inc si and dec si dec si:

  xor si, si    ; Set DS:SI to the start of the IVT
  mov ds, si
  cld
Again:
  lodsw         ; The offset part of a vector
  or  ax, [si]  ; The segment part of a vector
  jz  Found     ; If both parts are zero, then their OR will set ZF=1 (Zero Flag)
  add si, 2
  cmp si, 1024
  jb  Again     ; Repeat until the end of the IVT which is at address 1024
NotFound:
  ...
  jmp ..
Found:
  sub si, 2     ; -> DS:SI is the address of the first slot containing 0:0

edited Jun 19 '21 at 23:14

answered Jun 19 '21 at 21:43

Sep Roland

33,889
7
43
76

1

Using lodsw for both loads looks like it's actually less convenient, and is costing an extra instruction. `or ax, [si]` / `jz Found` / `inc si` / `inc si` (4 bytes not counting the JZ) would do the same job as `mov dx,ax`/ `lodsw` / `or ax,dx` (5 bytes). – Peter Cordes Jun 19 '21 at 22:23
Or keep it simple with `add si,2` for equal code-sizes but fewer instructions. (Of course, `lodsw` instead of `inc`/`inc` is even smaller, but costs another load so is worse for 8086 performance, if that's the goal instead of just code-size.) – Peter Cordes Jun 19 '21 at 22:23
My version is slower? Is code fetch + data access not the bottleneck on 8086 / 8088 for all of these, making smaller code in the loop the deciding factor? (For readability I like your mov / or version, though.) – Peter Cordes Jun 19 '21 at 23:26
@PeterCordes I've counted clocks: my 1st version (LODSW,MOV,LODSW,OR) has 12+2+12+3=29 clocks. Your version (LODSW,OR,ADD) has 12+14+4=30 clocks. Using (INC,INC) doesn't change the number of clocks (2+2) same as 4, but one byte less. – Sep Roland Jun 19 '21 at 23:30
3

Oh, you're using a cost model that doesn't include instruction-fetch. As I eventually found while researching [Increasing Efficiency of binary -> gray code for 8086](https://stackoverflow.com/a/67403962), a good method of estimating is count memory accesses and multiply by 4 to get a cycle count. `mov` taking only 2 cycles is relevant after a slow instruction like mul, to drain some bytes from the prefetch queue and make room for more code fetch to start to avoid unused memory cycles. – Peter Cordes Jun 19 '21 at 23:35
8088 was more heavily bottlenecked on code-fetch, naturally, and I've read that actual 8086 wasn't always. On 8088, stuff like 2x INC was almost always a win over `add reg,2` because each `inc` still took less than 4 cycles to execute. If we're modelling the prefetch pipeline, In my version there's lots of instruction-bytes after the slow `or ax, [si]` before the taken-branch that discards the queue, so execution can zip through the code fetched during that `or` and probably have the queue close to empty when the loop-branch is taken. (Using `loop` might be more efficient for 8086/8) – Peter Cordes Jun 19 '21 at 23:35
Cycle counting was never even accurate on the 8086… Surely you’ve read Michael Abrams’s book on optimizing assembly language for x86, specifically the parts about “cycle eaters”? (It’s long out of print, and the dead-tree version is hard to find, but it can be found online easily, even in forms endorsed by Abrams.) You can’t count cycles on 8088 because it’s got such slow mem access. You can actually get closer by simply counting *bytes*. On 8086, you’re less bottlenecked, and have none of the fancy features of later models, so cycle-counting is accurate-ish. Still just rule of thumb. – Cody Gray - on strike Jun 20 '21 at 01:12
1

You really must time the code. Little else is going to be accurate. Certainly no more than the intuition of a good ASM programmer. Unfortunately, it’s never been simple. Abrams’s book has his “Zen Timer” implementation, which is very helpful for timing code sequences. – Cody Gray - on strike Jun 20 '21 at 01:13
@CodyGray In recent years I did read about those "cycle eaters" in Michael Abram's (or is it Abrash?) book, but somehow I've never really connected the dots. Probably because we've evolved away from 8086 and don't count cycles at all, these days. And contemporary x86 has become so complex that timing now is the only option left. – Sep Roland Jun 20 '21 at 01:27
Abrash, of course. Typing on mobile, so didn’t look it up to avoid a faulty memory. In modern times, it’s enough to just optimize by reducing memory accesses. There are some minor differences in instruction sequences, but almost all are overshadowed by mem access times, and it only matters in the tightest of tight loops. – Cody Gray - on strike Jun 20 '21 at 02:02
@SepRoland: Yeah, the "cycle costs" I'd googled for initial version of [my answer on that earlier question](https://stackoverflow.com/a/67403962) were really not sounding right based on everything else I'd read about 8086 and especially 8088. Counting total memory accesses and multiplying by 4 is plenty simple. and good enough, especially if you don't truly care about one long obsolete member of the x86 family of CPUs this code can run on! Especially with some guesstimating of whether the prefetch queue might fill up or empty. – Peter Cordes Jun 20 '21 at 02:31
I guess 8088 is the slowest CPU your code could run on, so tuning for that ensures acceptable performance everywhere. (As if this loop could ever run often enough for its perf to matter... So you could just go for code-size and use `lodsw` instead of `add si,2` or inc/inc, since SI is known not to point at I/O memory or anything). Although I find it more interesting to try to avoid perf pot-holes across many uarches (e.g. avoid partial register stalls on P6) if that's possible without making it worse for 8086. e.g. use xor-zeroing / mov al, xyz instead of mov al, xyz / mov ah,0. – Peter Cordes Jun 20 '21 at 02:35
Also if you really want to optimize for code-size, when you want to use the result, use `mov [si-2], ...` instead of using `sub si,2`. Or use `lea di, [si-2]` to prepare for `stosw`. – Peter Cordes Jun 20 '21 at 02:37

How to display the interrupt vector table in assembly 8086?

1 Answers1