1

I am following Nick Blundell's tutorials for boot sector programming (https://www.cs.bham.ac.uk/~exr/lectures/opsys/10_11/lectures/os-dev.pdf and https://www.youtube.com/watch?v=YvZhgRO7hL4). My code works just fine in my qemu emulator, however when I run it on a physical machine it will crash whenever I begin to reference the segment registers. My teachers at school are not familiar with low level programming and cannot help me. Here is my bootloader, here I have annotated the lines that cause it to crash with the string CRASH (note: when I say crash, it actually just goes on to load my OS from the next disk. I am loading this code from an external HDD) :

[bits 16]
[org 0x7c00]        

mov bp, 0xffff
mov sp, bp
mov ax, 0x0000
mov ds, ax
;; mov es, ax ;; CRASH
;; mov ss, ax ;; CRASH

mov si, BOOT_MSG
call print_string
call print_newline

mov si, INIT_SEG_MSG
call print_string
call print_newline

;; mov dx, ds ;; CRASH
;; call print_hex
;; call print_newline

;;mov dx, cs ;; CRASH
;;call print_hex
;;call print_newline

;;mov dx, es ;; CRASH
;;call print_hex
;;call print_newline

;;mov dx, ss ;; CRASH
;;call print_hex
;;call print_newline

;; mov dl, 0x80         ;; disk where kernel is
;; mov cl, 3            ;; start sect
;; mov al, 1            ;; num sect
;; mov bx, 0x7ef0       ;; RAM addr
;; call load_kernel

;; mov si, KERN_MSG
;; call print_string
;; call print_newline

;;  call switch_to_pm

jmp $

%include "print.asm"
%include "print_hex.asm"
%include "disk.asm"
%include "pm.asm"

[bits 32]
pm :
    mov esi, PM_MSG
    call print_string_pm
    jmp 0x7ef0
    jmp $
[bits 16]

BOOT_MSG : db 'booted 16-bit to 0x7c00',0
KERN_MSG : db 'loaded kernel to es 0x7ef0',0
PM_MSG : db 'switched to 32-bit mode',0
INIT_SEG_MSG : db 'init segment registers',0

times 510-($-$$) db 0
dw 0xaa55                   `

I am sure I have a fundamental misunderstanding, any help will be appreciated. Here are my printing routines:

print_string :
    push ax
    _loop :
        lodsb
        cmp al, 0
        je _end

        mov ah, 0x0e
        int 0x10
        jmp _loop
    _end :
        pop ax
        ret

print_hex :
    mov si, HEX_TEMPLATE

    mov bx, dx
    shr bx, 12
    mov bx, [bx+HEXABET]
    mov [HEX_TEMPLATE+2], bl

    mov bx, dx      ;; bx -> 0x1234
    shr bx, 8       ;; bx -> 0x0012
    and bx, 0x000f  ;; bx -> 0x0002
    mov bx, [bx+HEXABET]
    mov [HEX_TEMPLATE+3], bl

    mov bx, dx      
    shr bx, 4
    and bx, 0x00f   
    mov bx, [bx+HEXABET]
    mov [HEX_TEMPLATE+4], bl

    mov bx, dx      
    and bx, 0x0f    
    mov bx, [bx+HEXABET]
    mov [HEX_TEMPLATE+5], bl

    call print_string
    ret 

    HEX_TEMPLATE : db '0x???? ',0
    HEXABET : db '0123456789abcdef'

print_newline :
    pusha
    mov ah, 0x0e
    mov al, 0x0d
    int 0x10
    mov al, 0x0a
    int 0x10
    popa
    ret
  • 1
    `mov bp, 0xffff` `mov sp, bp` is potentially bad. Stack is pointed to by SS:SP . You don't know where what the _SS_ is set to. You should be setting both SS and SP. PS: Stack pointer shouldn't not be an odd numbered address (for performance reasons). To set the stack just below the bootloader you could use `xor ax, ax` `mov ss, ax` `mov sp, 0x7c00` . `xor ax,ax` will zero out _AX_ (you could use the longer instruction `mov ax, 0x0000` – Michael Petch Sep 01 '16 at 22:16
  • The actual `mov dx, ds` type instructions will not crash by themselves. It has to be something else (something done after that mov). – Michael Petch Sep 01 '16 at 22:19
  • What type of processor is on your real hardware? 8086? 80286? 80386? – Michael Petch Sep 01 '16 at 22:42
  • The processor is an Intel Atom (32-bit)....I assume the emulator is automatically setting my segment registers. Also this code : `xor ax, ax` `mov ss, ax` `mov ds, ax` `mov sp, 0x7c00` `mov bp, sp` `jmp $` will cause a 'crash' as well – disassembler Sep 01 '16 at 22:46
  • The emulator and real hardware would have set up an SS:SP pair for itself during the BIOS startup. But you can't just change half of it like you are doing. If you change SP you need to change SS because you have no idea what value the BIOS was using for SS. And physical memory is computed by (segment * 16)+offset. It is possible to have your stack pointing into some potentially unwriteable region of memory for all you know. You may just get lucky that the code works. – Michael Petch Sep 01 '16 at 22:49
  • Oh there is one thing I totally missed. You should not hard code the drive number in your disk read routine like `mov dl, 0x80 ` . The actual drive number is passed by the BIOS to your bootloader in the _DL_ register. You should be using that value. The disk read interrupt will also need _ES_ to be 0 (not just DS). – Michael Petch Sep 01 '16 at 23:58
  • You should also explicitly set the direction flag (forward) with _CLD_ . – Michael Petch Sep 01 '16 at 23:59
  • You know you can boot your boot-loader in BOCHS and examine registers and memory with its built-in debugger, right? That doesn't help when it works in BOCHS with BOCHS's bios, but fails on real hardware, though. – Peter Cordes Sep 02 '16 at 00:20
  • Yes I wish the emulator wasn't so 'kind' and actually punished my mistakes, I am rebooting my Intel atom PC to test out my code....I am doing more research on segmenting and I will accept Shift_Left's answer once it works, just in case... setting the org to 0x7c00 and a jmp $ instruction works fine however – disassembler Sep 02 '16 at 00:30
  • Some notes: when doing `mov ss,something`, do `mov sp,offset` immediately after it. That's how it's supposed to be coupled, the `mov ss,??` will delay any pending interrupt by 1 instruction, so it's guaranteed to run `mov sp,??` after it, completing the `ss:sp` setup. Except when you do `sti` right ahead of `mov ss`, which also delays interrupt by 1 instruction, colliding with delay of `mov ss`: then interrupt can happen ahead of `mov sp`... :D <3 x86. BTW, when dealing with modern BIOS, and you run into this sort of "crash" after few sec delay, verify there's no "watch-dog" reset timer. – Ped7g Sep 02 '16 at 11:41

2 Answers2

2

BIOS's have a peculiarity about them in that the state of CS is unknown. In Bochs and probably Qemu CS = 0, therefore code with an origin of 0x7C00 will work. Real hardware may possibly pass CS = 0x7C0, so without an appropriate far jump at the beginning of code calls to near absolute functions would be skewed by 0x7C00 bytes with the origin set to 0x7C00.

Solutions:

    org    0x7C00

    jmp    0:Begin                   ; Far jump so CS = 0
Begin:
    mov    ax, cs
    mov    ds, ax
    mov    es, ax

or

    org    0
    jmp    0x7C0:0                   ; Far jump so CS = 0x7C0

This is probably what's happening and the crash is coming @ call print_string which is actually looking for code @ 0xF8??.

Shift_Left
  • 1,208
  • 8
  • 17
  • 1
    He does set _DS_ to zero with `mov ax, 0x0000 mov ds, ax`. _CS_ only has to be set properly for near absolute jumps or if he's using jump tables. Neither of which he's doing. Setting _ES_ is only necessary if you need it set.The code shown doesn't need it at the point of the crash (although it would be needed for the disk read interrupt). Since he has commented out all the calls it is hard to tell what he was using when it didn't work. Did it include the disk read code? I don't know. SS:SP is also another issue. If there are any 2 things that stand out it is SS:SP and ES as possible culprits. – Michael Petch Sep 01 '16 at 23:46
  • near absolute jumps also would apply to such calls as well. None in the code use that type. – Michael Petch Sep 01 '16 at 23:50
  • As well one other missing thing is the reliance on the direction flag being set properly. He should explicitly use _CLD_ to make the direction forward since the OP's usage of _LODSB_ assumes that to be the case. – Michael Petch Sep 01 '16 at 23:54
  • I've stripped the code down to just setting up the memory segments, since this is now confirmed to be the source of the crash... the following `[bits 16]` `[org 0x7c00]` `jmp 0:begin` `begin:` `mov ax, cs` `mov ss, ax` `mov ds, ax` `jmp $` `times 510-($-$$) db 0` `dw 0xaa55` still causes the computer to boot the OS from the next disk... – disassembler Sep 02 '16 at 00:08
  • It would seem that is a BIOS issue then, in that it is not recognizing a floppy, USB or some other medium. You have to either interrupt the boot process so you can select device or change device priorities in settings. – Shift_Left Sep 02 '16 at 01:14
  • Well it will boot and hang until I start to reference the segments. My understanding of x86 segment addressing is incorrect, I am continuing some more research on this. I will accept your answer for pointing me to the right direction. Thanks guys... – disassembler Sep 02 '16 at 02:15
  • @disassembler Rergarding the skipped boot, see [this](https://stackoverflow.com/questions/39240353/nasm-80x86-bootloader-needs-xor-ax-ax/). – Margaret Bloom Sep 02 '16 at 05:31
2

So after many hard resets, I have found the solution. The problem was that what I thought was the problem, WASN'T THE PROBLEM. So I decided to take my flash drive and boot from my dad's pc. It worked perfectly. So I continued to write my bootloader on my own pc. I wrote the code to enter PM mode, ran it, perfect, no problems. Then I made another change. Copied the bytes to the flash drive using dd, and ran it on my dad's pc. But wait. The change didn't show up... So I ran dd a few more times, zeroed the drive, and still, nothing. So I REBOOTED MY COMPUTER, RAN DD AGAIN, AND IT WORKED. The problem seemed to be the way that my OS (Ubuntu 16, not sure if it is relevant) issued the dd command. It seems the /dev/sdb is a sort of buffer that doesn't actually get written to the disk, at least after I remove the flash drive the first time. Maybe it is a bug in the OS. Maybe it is a bug in dd. Maybe I'm missing something. Who knows. The problem (as stated in the comments) is that I was unaware of how Linux block devices work. See: sync. Apparently the changes I made were CACHED and not written. Either way I will be using an emulator from now on. Thanks everyone!

  • Did you [run `sudo sync`](http://man7.org/linux/man-pages/man1/sync.1.html) or `eject /dev/sdb` before yanking out the USB drive? Linux block devices aren't synchronous by default. Also, are you sure the drive you wanted was under sdb when you re-inserted it? Look in `/dev/disk/by-id` (and/or the other by-whatever directories) – Peter Cordes Sep 02 '16 at 06:23
  • Anyway, yes, only test on real hardware once it works on emulated hardware. Or not at all if you're just doing this for fun. In theory it would be good to know what you're assuming about the state of things when the BIOS jumps to your code, to make sure you have a solid handle on what all the segments and registers are doing. – Peter Cordes Sep 02 '16 at 06:32
  • I'm assuming the BIOS will set es, ds, and ss (fs and gs?) to some place in memory that is free. Printing a hex value of [0x7c00] will print the first byte of my code. cs is set to 0x7c00? Since the BIOS IVT is at 0x0000, i'm assuming attempting to write here is illegal... – disassembler Sep 02 '16 at 06:43
  • IDK, I mostly use asm for tuning user-space code to run fast, not writing bootloaders. @MichaelPetch can probably tell you which if any of those things are true. I wouldn't count on anything you don't absolutely have to, though; defensive coding and worst-case assumptions seem to be the way to go when dealing with real BIOSes in the wild. And BTW, it's real mode, so you can write anywhere you want. You can overwrite IVT entries if you want, e.g. to hook the interrupt with your own handler which does something and then jumps to the original handler. – Peter Cordes Sep 02 '16 at 06:49
  • I think you can count on `ss:sp` being safe, at least if interrupts are enabled when the BIOS jumps to your code. Otherwise a timer interrupt could overwrite something. Other than that, a badly-written non-standards-compliant BIOS could probably still work well enough for a company to ship it (i.e. it loads and jumps to the MBR code, and boots Windows) while violating most of your assumptions. You can count on `cs:ip` pointing to your code, but you can't count on the actual value of `cs`. There are some standards for the BIOS passing info about boot disks to the MBR, but code defensively. – Peter Cordes Sep 02 '16 at 06:51
  • @PeterCordes : SS:SP will originally point at something but you don't know where. And if you don't know where you could end up using that same memory space for doing something like - well in this fellows code - writing the second stage in memory over top of it. Always set your own stack. Then you know where it is with less chance of interfering with your code. – Michael Petch Sep 02 '16 at 14:01
  • 1
    The only thing you can really count on these days (with the exception of some equipment from decades ago) is that _DL_ will have the boot drive number passed in it. As for everything else (including all the segment registers) don't make any assumptions that they contain specific values. I cover this in my [General Bootloader Tips](http://stackoverflow.com/questions/32701854/boot-loader-doesnt-jump-to-kernel-code/32705076#32705076) – Michael Petch Sep 02 '16 at 14:02
  • 1
    On a side note I use `udisks` to detach arbitrary types of USB media. There is a [Unix Stack Exchange](http://unix.stackexchange.com/questions/35508/eject-usb-drives-eject-command/129282#129282) with a reasonable explanation. It also works for USB hard drives. – Michael Petch Sep 02 '16 at 14:15