3

I recently wrote an x86 'bootloader' program that shows the values of the hardware registers after the BIOS jumps to my program. For the purpose of testing, I set the AX register to a known value to ensure that the program runs correctly.

BITS 16
%macro pad 1-2 0
    times %1 - ($ - $$) db %2
%endmacro
[org 0x7C00]
    CLD                 ; clear direction flag (forward direction)
    CLI                 ; clear interrupt flag (disable interrupts, opposite of 65xx)
    
    MOV [0x8000], AX    ; display all registers,
    MOV [0x8004], BX    ;   including stack,
    MOV [0x8008], CX    ;   segment, & extra
    MOV [0x800C], DX    ;   registers

    MOV [0x8010], SP
    MOV [0x8014], BP
    MOV [0x8018], SI
    MOV [0x801C], DI
    
    MOV [0x8020], CS
    MOV [0x8024], SS    ; we also display DS register,
    MOV [0x8028], ES    ;   so we can't modify it or
    MOV [0x802C], DS    ;   we'll loose our data
    
    MOV [0x8030], FS
    MOV [0x8034], GS
    
    MOV AX, 0x0123      ; write 0x0123 to [0x8000]
    MOV [0x8000], AX    ;   for debugging
    
    MOV DI, 0x804C      ; DI is pointer to address 0x804C
                        ; (temporary data)
    MOV AH, 0x02
    MOV BH, 0x00        ; video page 0?
    MOV DX, 0x0401
    INT 0x10            ; move cursor to XY:($01, $04)

    ; display register data
    MOV AL, 'A'
    CALL printXl        ; print 'AX:'
    MOV DX, [0x8000]    ; recall value of AX register
                        ;   (set to 0x0123 for test)
    CALL printascii     ; print 16-bit value @ [0x8000]
    
    ;...                ; omitted code: display other registers
    
    MOV AH, 0x00        ; wait for keyboard press
    INT 0x16
    
    INT 0x18            ; boot Windows

printXl:
    MOV AH, 0x0E
    XOR BX, BX
    INT 0x10            ; display character in 'AL'
    MOV AL, 'X'
    ; falls through
prnt:                   ; referenced in omitted code
    MOV AH, 0x0E
    INT 0x10            ; display character 'X'/'S'
    MOV AL, ':'
    INT 0x10            ; display character ':'
    RET
    
printascii:
    MOV [DI], DX            ; store value for later recall
    MOV AH, 0x0E            ; INT 10,E
    
    MOV SI, hexascii        ; load address of 'hexascii'
    AND DX, 0xF000
    SHR DX, 0x0C            ; shift high nibble to lowest 4 bits
    ADD SI, DX
    CS LODSB                ; AL = CS:[0x1EE + DX >> 12];
    INT 0x10                ; display high nibble of character value
            
    MOV SI, hexascii
    MOV DX, [DI]
    AND DX, 0x0F00
    SHR DX, 0x08
    ADD SI, DX
    CS LODSB
    INT 0x10                ; display low nibble of character value
            
    MOV SI, hexascii
    MOV DX, [DI]
    AND DX, 0x00F0
    SHR DX, 0x04
    ADD SI, DX
    CS LODSB
    INT 0x10                ; display high nibble of character value
            
    MOV SI, hexascii        ;
    MOV DX, [DI]
    AND DX, 0x000F
    ADD SI, DX
    CS LODSB
    INT 0x10                ; display low nibble of character value
            
    RET
pad 0x01EE
hexascii:
    db "0123456789ABCDEF"   ;
    
pad 0x01FE                  ; pad to end of bootsector
    dw 0xAA55               ; bootsector signature

When running from DOSBOX, I correctly see AX:0123, but when booting from my floppy disk, I get AX:FFFF. I have no idea what I'm doing wrong. For reference, my PC is a Intel Core 2 Quad.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
はるき
  • 141
  • 1
  • 9
  • 5
    You use `[org 0x7C00]` but you never initialize DS `0`. Some BIOSes might enter your MBR with CS=7c0, IP=0, and who knows what for DS. – Peter Cordes Sep 25 '20 at 19:11
  • 1
    `int 20h` is not a ROM-BIOS interrupt service. – ecm Sep 25 '20 at 20:09
  • 1
    You are also not showing us the "pad" macro. (Or perhaps it is built-in to your assembler? This `lea` line does not seem like valid NASM syntax so it depends on which assembler you use.) And the padding is wrong anyway, should pad to 510 (1FEh). – ecm Sep 25 '20 at 20:10
  • 2
    A possible problem: The stack as initialised by the ROM-BIOS loader may overlap your register storage area. – ecm Sep 25 '20 at 21:57
  • Possible duplicate of [Boot loader doesn't jump to kernel code](https://stackoverflow.com/a/32705076), Michael Petch's general tips for bootloaders. Re: memory layout and using `0x8000` addresses; that *may* be safe if DS = 0. [Understanding of boot loader assembly code and memory locations](https://stackoverflow.com/q/52127149) shows a memory map for the bootloader from MikeOS. – Peter Cordes Sep 25 '20 at 22:26

2 Answers2

2

It's impossible to do what you want to do while guaranteeing 100% safety.

The problem is that to store data anywhere you have to know you're storing it in a safe place (not overwriting your stack, not being overwritten by stack, not writing to ROM or something else that isn't RAM and not corrupting anything else in RAM like BIOS data or your code); and you have to modify register/s (mostly, a segment register) before you can know that you're storing data in a safe place, so you can't store the original value of those register/s anywhere safely. Note that this is what caused (at least one of) your original problems - not wanting to change DS (because you want to print its original value) and ending up not knowing if it's safe to use DS.

The "least unsafe" alternative is to (temporarily) use the stack that the BIOS left behind. It's likely that the BIOS left behind a stack somewhere that has enough space to ensure that if an IRQ occurs after the BIOS jumps to your code but before you can execute a single instruction (or setup a safe stack yourself) it won't cause a problem, and therefore likely that you can store a small amount of data on that stack. However; there is no guarantee that any interrupt (including both IRQs and BIOS functions) won't consume more stack than there will be left after you've consumed some (so you won't want to store lots of data on the stack); and ideally you'd transfer data stored on the stack somewhere else before enabling IRQs or calling any BIOS function.

This leads to something like the following (NASM syntax, untested):

    org 0x7C00

start:
    cli
    push ds
    push ax
    xor ax,ax
    mov ds,ax

    call far [.fixCSIP]     ;Push CS and IP then set CS and IP to known values
.fixCSIP:
    dw 0x0000, .here        ;Values to load into CS and IP
.here:

    pop word [0x8020]       ;Move original value of CS

    pop ax                  ;ax = original value of "IP + (.fixCSIP - start)"
    sub ax,.fixCSIP-start   ;ax = original value of IP
    mov [0x8038],ax         ;Store original value of IP

    pop word [0x8000]       ;Move original value of AX
    pop word [0x802C]       ;Move original value of DS

    ;SP is now back to the value it originally had

    mov [0x8010],sp
    mov [0x8024],ss

    xor ax,ax
    mov ss,ax
    mov sp,0x7C00

    ;Now CS:IP, DS and SS:SP are all "known safe" values, so we can start being normal

    sti

    mov [0x8004], bx
    mov [0x8008], cx
    mov [0x800C], dx

    ...
    
Brendan
  • 35,656
  • 2
  • 39
  • 66
  • What about `push 0` `pop ds` as a way to zero DS without trashing a register? Assuming, again, that you trust the stack to be able to hold 2 bytes. – Nate Eldredge Sep 26 '20 at 03:38
  • @NateEldredge: That would be fine too (probably a little better). – Brendan Sep 26 '20 at 03:42
  • 3
    You can `call 0:.here` directly, no need to add memory indirection. I'd also suggest storing the first several registers (particularly `ss` and `sp`) into the boot sector loader's position itself, starting at 0:7C00h. As it is in your answer, the initial stack may be placed at 0:8000h ish and may overwrite your stored values. – ecm Sep 26 '20 at 07:57
  • 1
    @NateEldredge : That has the disadvantage of not being available on the intel processors before the 80186. – Michael Petch Sep 26 '20 at 11:07
  • 2
    You also mixed up `cs` and `ip` on your stack frame. `ip` is to be popped first, then `cs`. – ecm Sep 26 '20 at 11:49
2

Quoting Brendan's answer:

The problem is that to store data anywhere you have to know you're storing it in a safe place (not overwriting your stack, not being overwritten by stack, not writing to ROM or something else that isn't RAM and not corrupting anything else in RAM like BIOS data or your code); and you have to modify register/s (mostly, a segment register) before you can know that you're storing data in a safe place, so you can't store the original value of those register/s anywhere safely.

The solution to this problem is to use the initial stack as set up by the ROM-BIOS, which should be safe for at least a few dozen bytes, and crucially to then store the first few registers' values into the space occupied by our own boot sector loader. This space is reserved for us and must not be overwritten by the initial stack as set up by the ROM-BIOS. After switching the stack to a known-good area we are allowed to use other memory too, though we do not need that for this example. Here's the NASM source (test.asm):

%if 0

Boot sector loader which displays register values
 by C. Masloch, 2020

Usage of the works is permitted provided that this
instrument is retained with the works, so that any entity
that uses the works is notified of this instrument.

DISCLAIMER: THE WORKS ARE WITHOUT WARRANTY.

%endif


        struc BS
bsJump: resb 3
bsOEM:  resb 8
bsBPB:
        endstruc

        struc EBPB              ;        BPB sec
bpbBytesPerSector:      resw 1  ; offset 00h 0Bh
bpbSectorsPerCluster:   resb 1  ; offset 02h 0Dh
bpbReservedSectors:     resw 1  ; offset 03h 0Eh
bpbNumFATs:             resb 1  ; offset 05h 10h
bpbNumRootDirEnts:      resw 1  ; offset 06h 11h -- 0 for FAT32
bpbTotalSectors:        resw 1  ; offset 08h 13h
bpbMediaID:             resb 1  ; offset 0Ah 15h
bpbSectorsPerFAT:       resw 1  ; offset 0Bh 16h -- 0 for FAT32
bpbCHSSectors:          resw 1  ; offset 0Dh 18h
bpbCHSHeads:            resw 1  ; offset 0Fh 1Ah
bpbHiddenSectors:       resd 1  ; offset 11h 1Ch
bpbTotalSectorsLarge:   resd 1  ; offset 15h 20h
bpbNew:                         ; offset 19h 24h

ebpbSectorsPerFATLarge: resd 1  ; offset 19h 24h
ebpbFSFlags:            resw 1  ; offset 1Dh 28h
ebpbFSVersion:          resw 1  ; offset 1Fh 2Ah
ebpbRootCluster:        resd 1  ; offset 21h 2Ch
ebpbFSINFOSector:       resw 1  ; offset 25h 30h
ebpbBackupSector:       resw 1  ; offset 27h 32h
ebpbReserved:           resb 12 ; offset 29h 34h
ebpbNew:                        ; offset 35h 40h
        endstruc

        struc BPBN              ; ofs B16 S16 B32 S32
bpbnBootUnit:           resb 1  ; 00h 19h 24h 35h 40h
                        resb 1  ; 01h 1Ah 25h 36h 41h
bpbnExtBPBSignature:    resb 1  ; 02h 1Bh 26h 37h 42h -- 29h for valid BPBN
bpbnSerialNumber:       resd 1  ; 03h 1Ch 27h 38h 43h
bpbnVolumeLabel:        resb 11 ; 07h 20h 2Bh 3Ch 47h
bpbnFilesystemID:       resb 8  ; 12h 2Bh 36h 47h 52h
        endstruc                ; 1Ah 33h 3Eh 4Fh 5Ah


        cpu 8086
        org 7C00h

start:
        jmp short entrypoint
        nop

        times (bsBPB + EBPB_size + BPBN_size) - ($ - $$) db 0

entrypoint:
        pushf
        cli                    ; An interrupt could use too much more stack space
        cld
        push bx
        push ds
        call 0:.next           ; Set CS:IP to match ORG
.next:
        pop bx                 ; BX = IP of return address pushed by call
        sub bx, .next - start  ; calculate original IP on entry to start
        push bx
         push cs
         pop ds                ; DS=0 to match ORG
        mov bx, start
        pop word [bx + reg_ip]      ; store into start + BPB space 
        pop word [bx + reg_cs]
        pop word [bx + reg_ds]
        pop word [bx + reg_bx]
        pop word [bx + reg_fl]
        mov word [bx + reg_sp], sp
        mov word [bx + reg_ss], ss
        mov word [bx + reg_ax], ax
        xor ax, ax
        mov ss, ax
        mov sp, bx              ; set sp immediately after ss
        sti
        mov word [bx + reg_cx], cx
        mov word [bx + reg_dx], dx
        mov word [bx + reg_es], es
        mov word [bx + reg_si], si
        mov word [bx + reg_di], di
        mov word [bx + reg_bp], bp

        mov si, table
        ; bx -> start
loop_table:
        mov al, 32
        call disp_al
        lodsw
        call disp_al
        xchg al, ah
        call disp_al
        cmp al, 32
        jbe .next
        mov al, '='
        call disp_al
        mov ax, [bx]
        inc bx
        inc bx
        call disp_ax_hex
.next:
        cmp si, table.end
        jb loop_table

exit:
        xor ax, ax
        int 16h
        int 19h


disp_al:
        push ax
        push bx
        push bp

        mov ah, 0Eh
        mov bx, 7
        int 10h

        pop bp
        pop bx
        pop ax
        retn

disp_ax_hex:                    ; ax
                xchg al,ah
                call disp_al_hex                ; display former ah
                xchg al,ah                      ;  and fall through for al
disp_al_hex:                    ; al
                push cx
                mov cl,4                          ; ror al,4 would require 186
                ror al,cl
                call disp_al_lownibble_hex      ; display former high-nibble
                rol al,cl
                pop cx
                                                ;  and fall through for low-nibble
disp_al_lownibble_hex:
                push ax                  ; save ax for call return
                and al,00001111b                ; high nibble must be zero
                add al,'0'                      ; if number is 0-9, now it's the correct character
                cmp al,'9'
                jna .decimalnum          ; if we get decimal number with this, ok -->
                add al,7                        ;  otherwise, add 7 and we are inside our alphabet
 .decimalnum:
                call disp_al
                pop ax
                retn


        struc registerstorage
reg_ss: resw 1
reg_bp: resw 1
reg_sp: resw 1
reg_cs: resw 1
reg_ip: resw 1
reg_fl: resw 1
reg_ds: resw 1
reg_si: resw 1
reg_es: resw 1
reg_di: resw 1
reg_ax: resw 1
reg_bx: resw 1
reg_cx: resw 1
reg_dx: resw 1
        endstruc

%if registerstorage_size + start > entrypoint
 %error Entrypoint is not safe
%endif

        align 2
table:
        dw "SS"
        dw "BP"
        dw "SP"
        dw "CS"
        dw "IP"
        dw "FL"
        db 13,10
        dw "DS"
        dw "SI"
        dw "ES"
        dw "DI"
        db 13,10
        dw "AX"
        dw "BX"
        dw "CX"
        dw "DX"
        db 13,10
.end:

        times 510 - ($ - $$) db 0
        dw 0AA55h

Assemble with nasm test.asm -f bin -o test.bin and then load as a boot sector. Example:

 -boot protocol chain test.bin
 -r
 AX=0000 BX=0000 CX=F000 DX=0000 SP=7BF0 BP=07BE SI=07BE DI=0000
 DS=0000 ES=0060 SS=0000 CS=0000 IP=7C00 NV UP DI PL ZR NA PE NC
 0000:7C00 EB58              jmp     7C5A
 -g
  SS=0000 BP=07BE SP=7BF0 CS=0000 IP=7C00 FL=0046
  DS=0000 SI=07BE ES=0060 DI=0000
  AX=0000 BX=0000 CX=F000 DX=0000
 Boot load called
 -

(The part between -g and Boot load called is the output of the boot sector loader.)

ecm
  • 2,583
  • 4
  • 21
  • 29
  • 1
    You could maybe save a couple instructions early on with `mov [cs: start + reg_ip], bx` instead of `push bx` / ... / `pop [bx + reg_ip]`. Similarly, `push ds` / `pop` could just be a `mov [cs: bx + reg_ds], ds` after setting CS, before setting DS. Both of those changes are probably neutral or worse for code-size, though. CS override costs a byte, and so does `disp16` vs. `reg+disp8`. – Peter Cordes Sep 26 '20 at 20:22
  • 1
    You can save some code bytes by using a 4-iteration loop for the 4 hex digits, instead of that call / fall-through hack. https://godbolt.org/z/nef3EY shows my version, from the `mov si, table` to the `.end:` label. I got it down from 0x7E bytes for your version to 0x6E for mine. (I also removed some push/pop, letting functions clobber some registers, and have less work inside loops so it's more efficient, although that's basically irrelevant. Even with full save/restore registers in `disp_ax_hex` and `disp_al`, I think I still saved a few bytes.) – Peter Cordes Sep 26 '20 at 21:36
  • @Peter Cordes: Nice work! Just one comment, on your `disp_al` you wrote that "AL is preserved" through calling the interrupt 10h service 0Eh. This is [not always true](https://github.com/FDOS/kernel/pull/25), though we can expect that none of the calls will have to scroll the screen so this oddity would not occur. You only depend on `al` being preserved for `cmp al, ' '` though so `push ax` \ ... \ `pop ax` could be used for that alone. – ecm Sep 26 '20 at 21:46
  • 1
    Ugh, ok, so my version doesn't work on buggy BIOSes that don't work as documented (http://www.ctyme.com/intr/rb-0106.htm). If you're going to put a push/pop ax anywhere as a work-around for such buggy BIOSes, it could just be in `disp_al` if we only care about code-size, then you can go back to `lodsw` / `xchg al,ah` since the whole AX is getting stored/reloaded anyway. Or `cmp byte [si-1], ' '` so we don't depend on AX. Or `mov cl, al` / `call disp_al` / `cmp cl, ' '` at the cost of code-size. Or compare first, if `int 10h` preserves FLAGS? – Peter Cordes Sep 26 '20 at 21:57
  • 1
    @Peter Cordes: In my defense of the nested calls way of doing `disp_ax_hex`, other users of that code often also need `disp_al_hex` separately, so there's some sense to do `ax` with two `al` calls. Also extends quite naturally to `disp_dxax_hex`. – ecm Sep 26 '20 at 21:57
  • 1
    Ah yes, that makes sense if already had that written as a generic extensible way to show a byte or word. It's a neat idea, it's just not as compact as possible if you do only ever need 4 hex digits. (Related: [How to convert a binary integer number to a hex string?](https://stackoverflow.com/q/53823756) is my Q&A about hex output, with some fun SSE2 and AVX512 versions, as well as scalar lookup table and cmov versions.) – Peter Cordes Sep 26 '20 at 21:59