5

I am trying to write my own bootloader. While it works fine in QEMU, Bochs and VirtualBox, I cannot seem to make it work on my laptop.

On my laptop, the bootloader behaves very differently to all emulators, hanging seemingly random places, refusing to print, even skipping some jmp $ instructions.

While I have a lot of trouble with "real-hardware", I figure there is one cause to them all.

The following code is a short bootloader that should print a "TEST" message 3 times, then hang by jumping to the same location:

[BITS 16]                                                                          
[ORG 0x7C00]                                                                                                    
    jmp 0x0000:start_16  ; In case bootloader is at 0x07C0:0x0000                                                             
start_16:                                                                          
    xor ax, ax                                                                 
    mov ds, ax                                                                 
    mov es, ax                                                                 
    cli                             ; Disable interrupts                       
    mov ss, ax                                                                 
    mov sp, 0x7C00                                                             
    sti                             ; Enable interrupts                        
    cld                             ; Clear Direction Flag                     
    ; Store the drive number                                                   
    mov [drive_number], dl                                                     
    ; Print message(s)                                                         
    mov si, msg                                                                
    call print_string                                                          
    mov si, msg                                                                
    call print_string                                                          
    mov si, msg                                                                
    call print_string                                                          

    jmp $   ; HALT                                                                                   

; print_string                                                                     
;       si      = string                                                           
print_string:                                                                      
    pusha                                                                      
    mov ah, 0x0E                                                               
.repeat:                                                                           
    lodsb                                                                      
    cmp al, 0x00                                                               
    je .done                                                                   
    int 0x10                                                                   
    jmp short .repeat                                                          
.done:                                                                             
    popa                                                                       
    ret                                                                        

; Variables                                                                        
drive_number db 0x00                                                               
msg db 'TEST', 0x0D, 0x0A, 0x00                                                    
times 510-($-$$) db 0x00                                                           
db 0x55                                                                            
db 0xAA

Compile and emulate the code with:

$ nasm -f bin bootloader.asm
$ qemu-system-x86_64 bootloader

On emulators, it prints "TEST" three times and hangs, On my laptop, it prints "TEST" followed by 3 weird characters:

Bootloader output on my laptop.

Most bootloader code from http://wiki.osdev.org does not work either. For example, none of the code-snippets from http://wiki.osdev.org/Babystep2 work on my laptop.

What is wrong with my code? How can I fix it?


Additional information

If I remove the 2 unnecessary mov si, msg, the "TEST" message is printed twice.

Laptop:

  • Asus Vivobook S200,
  • CPU: Intel i3-3217U
  • BIOS: American Megatrends, Version 210.
  • The computer works just fine with any other bootloader such as Grub.

Assembly and writing:

$ nasm -f bin bootloader.asm
$ qemu-system-x86_64 bootloader # TEST 1 
$ sudo dd if=/dev/zero of=/dev/sdd bs=1M count=1 # clean the USB 
$ sudo dd if=bootloader of=/dev/sdd conv=fsync # write to USB 
$ qemu-system-x86_64 /dev/sdd # TEST 2 

Edit 1

Ross Ridge noticed in the comments that Ω♣| are the first 3 bytes of the bootloader.

Edit 2

Updated print function and string:

print_string:                                                                      
    pusha                                                                          
.repeat:                                                                        
    mov ah, 0x0E                                                                
    xor bx, bx                                                                  
    cld                             ; Clear Direction Flag                      
    lodsb                                                                       
    cmp al, 0x00                                                                
    je .done                                                                    
    int 0x10                                                                    
    jmp short .repeat                                                           
.done:                                                                          
    popa                                                                        
    ret 

msg db 'TEST', 0x00 

Output

A single TEST. Additional two are missing.

Edit 3

As suggested by Ross Ridge, dumpregs have been added for better debugging. The int 0x10 does not modify any registers. After some testing, I have moved the dumpregs function around drive_number assignment and a jmp $ just after. The code should print 1 line of register dump and halt. Instead, it continues: dumpregs around drive_number assignment

The full code: https://gist.github.com/anonymous/0ddc146f73ff3a13dd35

Edit 4

Disassembly of the current bootloader using:

$ ndisasm -b16 bootload2 -o 0x7c00

https://gist.github.com/anonymous/c9384fbec25513e3b815

Michael Petch
  • 46,082
  • 8
  • 107
  • 198
Janman
  • 1,030
  • 2
  • 12
  • 20
  • 1
    It's not an answer, but I have found that Asus machines can have weird behavior with custom boot loaders. One of our projects involves a custom disk format and initial loader.. Runs great on everything except an Asus. The asus has the same behavior that you're describing. I'm not saying it's the Asus, per se... it's likely something that we've done incorrectly.. but the hardware seems less forgiving than other hardware. :) – David Hoelzer Jan 23 '16 at 17:47
  • 1
    Try to keep interrupts disabled? Also try to put the `mov ah, 0x0E` into the loop. And `push si`/`pop si` around the `int 0x10`. – Jester Jan 23 '16 at 17:49
  • @Jester I have tried your suggestions. The only thing that changed is the printed string: "TEST ≡S " – Janman Jan 23 '16 at 17:58
  • @DavidHoelzer I also think there might be some bug in Asus machines, but I just cannot believe one single interrupt call to BIOS can go so wrong. – Janman Jan 23 '16 at 17:59
  • I hear you, @janman... However, since Linux and Windows seem to boot just fine, I'm inclined to believe that it's us. ;) – David Hoelzer Jan 23 '16 at 18:00
  • 1
    Out of curiosity what happens if you move `cld` into print_string *after* `repeat:`? This may seem like an odd test, but I'm wondering if under some circumstances on the ASUS `int 0x10` is setting the direction flag which might cause subsequent writes to print garbage (it wouldn't explain the jmps being affected). Another thing I would try is removing the `0x0D, 0x0A` from the string as a test. Maybe there is a bug processing one of these special characters. – Michael Petch Jan 23 '16 at 18:35
  • Something i'd consider is setting `bh` to 0, just to make sure you are making sure `int 0x10` is always using the active page (page 0 in this case) – Michael Petch Jan 23 '16 at 18:37
  • 2
    The characters `Ω♣|` are the first three bytes of your program, but I can't see how they would be printed by that code. – Ross Ridge Jan 23 '16 at 18:45
  • @MichaelPetch I have updated my print function with `cld`, `mov ax, 0x0E`, and `xor bx, bx` to ensure correct interrupt call arguments, but with no luck. – Janman Jan 23 '16 at 18:50
  • @RossRidge Well spotted! – Janman Jan 23 '16 at 18:50
  • Did you put all three of those instructions (as a test) after the `repeat:` so that they are all initialized before each call to `int 0x10` – Michael Petch Jan 23 '16 at 18:51
  • @MichaelPetch Yes. I have posted the updated print function under **Edit 2**. – Janman Jan 23 '16 at 18:58
  • Besides Jester's other comment about pushing and popping _SI_ around the `int 0x10` the only other possibility can think of that would explain the `jmp $` being skipped over is if somehow the stack is being trashed and the return address messed up. What happens if you move the stack SS:SP somewhere else as an experiment. Maybe try SS:SP of 0x9000:0x0000. – Michael Petch Jan 23 '16 at 19:00
  • @MichaelPetch Setting SS to 0x9000 and SP to 0x0000 made the laptop print the string once followed by the first 3 bytes of the bootloader: `TESTΩ♣|` – Janman Jan 23 '16 at 19:08
  • I assume based on that output you have removed the CR and LF from the string? Or did the strings come out on separate lines? If you haven't experimented with removing 0x0a and 0x0d from `msg`, I'd recommend at least trying it and see what happens. – Michael Petch Jan 23 '16 at 19:10
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/101481/discussion-between-janman-and-michael-petch). – Janman Jan 23 '16 at 19:12
  • I'm also going to assume that in the BIOS you had disabled secure boot, and enabled "Launch CSM"? – Michael Petch Jan 23 '16 at 22:06
  • @MichaelPetch Yes, CSM enabled and SecureBoot disabled :) – Janman Jan 23 '16 at 23:29

2 Answers2

4

It's now looking like the BIOS maybe modifying the BIOS parameter block that it incorrectly assumes is a part of the bootsector loaded into memory. It might feel it needs to because the geometry it gives to the USB device when booting it might differ from the geometry used when the bootsector and its presumed BPB was written to the device. Since the presence of a jump instruction at the start of a bootsector is one of the ways applications test for the existence of a BPB, you can try inserting some other instruction (but not a NOP) at the start of your bootsector. For example:

[BITS 16]                                                                          
[ORG 0x7C00]
    xor ax,ax                                                                                                
    jmp 0x0000:start_16  ; In case bootloader is at 0x07C0:0x0000                                                             
start_16:                       

Note that a far jump doesn't appear to be one of the normal indicators of the presence of a BPB. In fact it's pretty conclusive evidence that a BPB isn't present, as the BPB starts at offset 4 and a far jump instruction is 5 bytes long.

If that doesn't work you can try reserving space for a BPB, something like this:

[BITS 16]                                                                          
[ORG 0x7C00]
    jmp start
    nop
    resb 8 + 25
start:                                                                                          
    jmp 0x0000:start_16  ; In case bootloader is at 0x07C0:0x0000                                                             
start_16:                       

Here's some code you can use to dump registers to help debug the problem:

dumpregs:
    push    es
    pusha
    push    0xb800
    pop es  
    mov di, [vidmem_ptr]
    mov bp, sp
    mov cx, 8
dump_loop:
    dec bp
    dec bp
    mov ax, [bp + 16]
    call    printhex2
    inc di
    inc di
    loop    dump_loop
    mov [vidmem_ptr], di
    popa
    pop es
    ret

printhex2:
    push    ax
    mov al, ah
    call    convhex1
    pop ax
convhex1:
    aam 16
    ; DB    0D4h, 16
    xchg    al, ah
    call    convhex
    mov al, ah
convhex:
    cmp al, 10
    jb  lessthan_10
    add al, 'A' - '0' - 10
lessthan_10:
    add al, '0'
    stosb
    mov al, 7
    stosb
    ret

vidmem_ptr dw 5 * 80 * 2 ; start at row 5

It uses direct video writes to dump all the general purpose registers. It uses the PUSHA/POPA register order: AX CX DX BX SP BP SI DI

If you bracket the INT 0x10 instruction in your original code with calls to this function, you should get output like this:

Screenshot of boot with debug output

In particular you want to make sure the 8 numbers of the left hand side match the 8 numbers on the right hand side for each of lines of debug output. The BIOS call you're using shouldn't be changing any registers.

Ross Ridge
  • 38,414
  • 7
  • 81
  • 112
  • I must be going crazy, I hope you're ready for this. I have copied your code and tried it. While it only printed "TEST" twice, all registers matched on both sides. Playing around with your function, I cleaned up the code by removing `drive_number` variable and its assignment, **and then test got printed correctly!**. To verify this, I placed your `dumpregs` function around `mov [drive_number], dl`, adding a `jmp $` after that, but the code printed "TEST" once nonetheless. You can see the full modified code and screenshot under **Edit 3** in OP. – Janman Jan 23 '16 at 23:26
  • This solved my issue - the BIOS was overwriting parts of my bootloader with BPB, hence, some instructions overwritten. This caused all sorts of issues, such as overwriting the `jmp $`. – Janman Jan 24 '16 at 12:07
2

I've seen similar symptoms before. The scenario is probably something like:

a) You're booting the laptop from USB flash

b) The BIOS looks at the first sector of the device and tries to decide if it should treat it like a floppy disk or like a hard drive. It fails to find a valid partition table, so it decides to treat it like a floppy disk.

c) The BIOS tries to be "helpful" and patches some fields in the BPB.

d) You have no BPB so your code and/or data gets trashed, causing strange things to happen (where minor changes in your code cause different strange things to happen).

Sadly, I don't have hardware that does this myself. I don't know why the BIOS assumes there's a BPB, or which of the multiple different "BPB formats" it assumes there is, or what it thinks it might be patching or why it thinks it's necessary to patch anything. I only know some BIOSs do trash the bytes at offsets 0x1C to 0x1F and the byte at offset 0x24.

Brendan
  • 35,656
  • 2
  • 39
  • 66
  • I'd guess that it inorrectly identifies a BPB because the first byte indicates a jmp. My guess is the bios code wasn't coded to look for a relative jump only. It's assumption was bad, felt there was a BPB and modified it – Michael Petch Jan 24 '16 at 00:58
  • And in this case the far jmp isn't a stretch because a number of bootloaders do it to set cs:ip to expected values. – Michael Petch Jan 24 '16 at 01:00
  • @MichaelPetch: I don't know. It could just be `if( no partition table) { make false assumptions and trash things }` where it doesn't look at anything other than the partition table. – Brendan Jan 24 '16 at 01:03
  • Hello, what's your source for "0x1C to 0x1F and the byte at offset 0x24."? – travisjayday Aug 17 '17 at 12:25