3

@MichaelPetch has rewritten the entire question to reduce it to a specific problem that should be easily reproduced. The original question focussed on a problem encountered doing OS development in 64-bit long mode. The code was attempting to use 8042 PS/2 controller to reboot a machine but it failed to work on QEMU, although it did work in BOCHS. The original code can be found in this Github project.

Michael determined that the problem was not long mode specific. The problem space was substantially reduced to better illustrate the core issue.


For this demonstration I am:

  • Using GRUB Multiboot2 specification to boot a 32-bit kernel.
  • Use the 8042 PS/2 controller to reboot the machine via the Keyboard.
  • Creating an ELF executable and place it in an ISO image so it can be boot as a CDROM.
  • Assuming for this code that the target machine/environment supports rebooting via the 8042 PS/2 controller

The code for this demonstration is as follows:

bootloader.asm:

[BITS 32]

section .mboot
mboot_header_start:
    dd 0xe85250d6
    dd 0
    dd mboot_header_end - mboot_header_start
    dd 0x100000000 - (0xe85250d6 + 0 +(mboot_header_end - mboot_header_start))

    align 8
mboot_inforeq_start:
    dw 1
    dw 0
    dd mboot_inforeq_end - mboot_inforeq_start
    dd 2
    dd 6
    dd 8
mboot_inforeq_end:

    align 8
mboot_end_start:
    dw 0
    dw 0
    dd mboot_end_end - mboot_end_start
mboot_end_end:

mboot_header_end:

section .text
global _start
_start:
    mov word [0xb8000], (0x5f << 8) | 'B'
    mov word [0xb8002], (0x5f << 8) | 'O'
    mov word [0xb8004], (0x5f << 8) | 'O'
    mov word [0xb8006], (0x5f << 8) | 'T'

    ; Delay after writing to the screen so it appears for a bit of time before reboot
    mov ecx, 0xfffff
delay:
    loop delay

    ; Wait until the 8042 PS/2 Controller is ready to be sent a command
wait_cmd_ready:
    in al, 0x64
    test al, 00000010b
    jne  wait_cmd_ready

    ; Use 8042 PS/2 Controller to reboot the machine
    mov al, 0xfe
    out 0x64, al

    ; If this is displayed the reboot wasn't successful. Shouldn't get this far
    mov word [0xb8000+160], (0x5f << 8) | 'N'
    mov word [0xb8002+160], (0x5f << 8) | 'O'
    mov word [0xb8004+160], (0x5f << 8) | 'R'
    mov word [0xb8006+160], (0x5f << 8) | 'E'
    mov word [0xb8006+160], (0x5f << 8) | 'B'

    ; Infinite loop to end
hltloop:
    hlt
    jmp hltloop

link.ld:

ENTRY(_start);

kern_vma = 0x100000;

SECTIONS
{
    . = 0x500;
    .boot :
    {
        *(*.mboot*)
    }
    . = kern_vma;
    .text ALIGN(4K) :
    {
         *(*.text*)
    }
    .bss ALIGN(4K) :
    {
         *(.bss)
    }
}

My Linux build script is:

#!/bin/sh

ISO_DIR="isodir"
ISO_NAME="myos"
GRUB_CFG="grub.cfg"
KERNEL_NAME="bootloader"

nasm -f elf32 bootloader.asm -o bootloader.o
ld -m elf_i386 -T link.ld bootloader.o -o $KERNEL_NAME.elf

mkdir -p $ISO_DIR/boot/grub
cp $KERNEL_NAME.elf $ISO_DIR/boot/

echo 'set timeout=2'                          >  $ISO_DIR/boot/grub/$GRUB_CFG
echo 'set default=0'                          >> $ISO_DIR/boot/grub/$GRUB_CFG
echo 'menuentry "My Kernel" {'                >> $ISO_DIR/boot/grub/$GRUB_CFG
echo '  multiboot2 /boot/'$KERNEL_NAME'.elf'  >> $ISO_DIR/boot/grub/$GRUB_CFG
echo '}'                                      >> $ISO_DIR/boot/grub/$GRUB_CFG

# build iso image
grub-mkrescue -o $ISO_NAME.iso $ISO_DIR/

The Problem

When I run the build script and run it in QEMU with this command:

qemu-system-i386 -cdrom myos.iso

GRUB boots the kernel and BOOT is properly displayed with white on magenta attributes in the upper left hand corner of the window. It should briefly wait before rebooting the machine load GRUB and repeat the cycle.

It doesn't do what is expected. It displays BOOT as it should but QEMU appears to sit and do nothing which is not what I expect.

If I run QEMU with the additional option -d int I find that the machine appears to be in a perpetual loop consisting of an invalid opcode exception (v=06), a general protection fault (v=0d), and ends with a double fault (v=08). The output usually looks something like:

     0: v=06 e=0000 i=0 cpl=0 IP=0008:000f0000 pc=000f0000 SP=0010:00000fc0 env->regs[R_EAX]=00000000
EAX=00000000 EBX=00000000 ECX=00000010 EDX=000f171d
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000fc0
EIP=000f0000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
CS =0008 00000000 ffffffff 00cf9b00 DPL=0 CS32 [-RA]
SS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
DS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
FS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
GS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
GDT=     000f6080 00000037
IDT=     000f60be 00000000
CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000
DR6=ffff0ff0 DR7=00000400
CCS=00000000 CCD=0000007f CCO=ADDB
EFER=0000000000000000
check_exception old: 0xffffffff new 0xd
     1: v=0d e=0032 i=0 cpl=0 IP=0008:000f0000 pc=000f0000 SP=0010:00000fc0 env->regs[R_EAX]=0000
0000
EAX=00000000 EBX=00000000 ECX=00000010 EDX=000f171d
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000fc0
EIP=000f0000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
CS =0008 00000000 ffffffff 00cf9b00 DPL=0 CS32 [-RA]
SS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
DS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
FS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
GS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
GDT=     000f6080 00000037
IDT=     000f60be 00000000
CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000
DR6=ffff0ff0 DR7=00000400
CCS=00000000 CCD=0000007f CCO=ADDB
EFER=0000000000000000
check_exception old: 0xd new 0xd
     2: v=08 e=0000 i=0 cpl=0 IP=0008:000f0000 pc=000f0000 SP=0010:00000fc0 env->regs[R_EAX]=0000
0000
EAX=00000000 EBX=00000000 ECX=00000010 EDX=000f171d
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000fc0
EIP=000f0000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
CS =0008 00000000 ffffffff 00cf9b00 DPL=0 CS32 [-RA]
SS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
DS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
FS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
GS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
GDT=     000f6080 00000037
IDT=     000f60be 00000000
CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000
DR6=ffff0ff0 DR7=00000400
CCS=00000000 CCD=0000007f CCO=ADDB
EFER=0000000000000000
check_exception old: 0x8 new 0xd
check_exception old: 0xffffffff new 0x6

It keeps repeating a similar pattern. What is unusual is that it seems to be stuck in this cycle of exceptions:

0: v=06 e=0000 i=0 cpl=0 IP=0008:000f0000 pc=000f0000 SP=0010:00000fc0 env->regs[R_EAX]=00000000
1: v=0d e=0032 i=0 cpl=0 IP=0008:000f0000 pc=000f0000 SP=0010:00000fc0 env->regs[R_EAX]=0000
2: v=08 e=0000 i=0 cpl=0 IP=0008:000f0000 pc=000f0000 SP=0010:00000fc0 env->regs[R_EAX]=0000

What is causing this problem and how can I fix it so that QEMU will properly reboot and launch GRUB again?

Michael Petch
  • 46,082
  • 8
  • 107
  • 198
alberinfo
  • 33
  • 4
  • `add %al, (%eax)`, and `add %al, (%bx, %si)` are disassembly of `00 00` bytes in 32 or 16-bit mode, respectively. So it's a very specific kind of garbage, and you're disassembling it in 2 modes other than long mode. – Peter Cordes May 20 '20 at 03:28
  • 1
    This isn't really a [mcve]; off-site links to your code are not sufficient for Stack Overflow rules. It's short enough that you should include the code directly in a code block, as well as linking your project on github in case anyone wants to see the full source and/or debug it for you. – Peter Cordes May 20 '20 at 03:30
  • Ok, so basically add %al, (%eax) and add %al, (%bx, %si) are the disassemble of 00 00 in pmode and real mode respectively. Is there any way that qemu actually rebooots and doesnt just execute garbage? And about the minimal reproducible example, i didnt notced it when i wrote the question. Ill remember it for the next posts (if i make new ones) – alberinfo May 20 '20 at 04:27
  • Sorry, those kinds of details like how QEMU reboots, or how BOCHS vs. QEMU might differ slightly, aren't something I know about. Most of what I know about asm is performance tuning, not osdev HW interaction / details of legacy IBM PC-compatible legacy stuff. – Peter Cordes May 20 '20 at 04:31
  • 1
    *Ill remember it for the next posts* - No, you need to [edit] this post if you want anyone to bother reading it and helping you. – Peter Cordes May 20 '20 at 04:32
  • Thanks for advicing, both for the missing directories and for the interrupt handler. Could you tell me whats the problem in the ISR handler? even though it wont break without optimizations, i'd like to make sure its correct – alberinfo May 20 '20 at 11:09
  • ok thanks for advicing. Even though i just tried it and register values were wronger than i've ever seen'em, ill make a note in my code so i remember it if something related happens. for now i will try to fix this issue with qemu, and then i'll probably finish things with isrs (what you said, and things like handling error codes, getting rip register, which seems to be a real pain). – alberinfo May 20 '20 at 12:29
  • yeah, i upload all the folders with all the files and github either doesn't catch them or it says that they are ready to upload, but when i upload they aren't there – alberinfo May 20 '20 at 20:25
  • ok, so now it should work. I uploaded with git instead of using github directly – alberinfo May 20 '20 at 21:50
  • Wow, i would never think that placing grub before 1M would cause this problems. Fixed. About the ESP, seems that i missed it when writing the file that handles the first time from grub. You have to post the answer and i have to mark it as the right answer, right? – alberinfo May 21 '20 at 18:04
  • no not actually; Edit what you consider it has to be edited. It will help me and everyone that is seeing this question – alberinfo May 21 '20 at 20:53

1 Answers1

4

The original Operating System code required effort to determine the root cause of this problem. The example makes it easier for people to potentially spot the culprit. Finding this problem in the original code took some effort.


The primary problem is that you have placed the Multiboot section (with the Multiboot2 header) at Virtual Memory Address (VMA) 0x500 in the linker script with this:

SECTIONS
{
    . = 0x500;
    .boot :
    {
        *(*.mboot*)
    }
    . = kern_vma;
    .text ALIGN(4K) :
    {
         *(*.text*)
    }
    .bss ALIGN(4K) :
    {
         *(.bss)
    }
}

The VMA and Load Memory Address (LMA) are both set to 0x500 when the .boot section is emitted by the linker. The problem is that GRUB will attempt to load this section at memory address 0x500 when reading your ELF executable. Placing any code and data below 0x100000 is a very bad idea. GRUB will use this area of memory for doing all its boot related tasks including loading your kernel. You may be inadvertently overwriting memory that GRUB is using, potentially putting the machine in an undefined state. It may work on some machines running GRUB and not others. Problems may manifest in other ways.

The only rules from the Multiboot2 specification about the location of the Multiboot2 header is that it be aligned on a Quadword Word boundary (64-bit) and fall within the first 32,768 bytes of the ELF file. You don't need to place the Multiboot2 header below 0x100000. That is unnecessary. Place it before your code and data above 0x100000. This should work:

ENTRY(_start);

kern_vma = 0x100000;

SECTIONS
{
    . = kern_vma;
    .boot :
    {
        *(*.mboot*)
    }
    .text ALIGN(4K) :
    {
         *(*.text*)
    }
    .bss ALIGN(4K) :
    {
         *(.bss)
    }
}

In your case writing to memory below 0x100000 at 0x500 prevents QEMU from rebooting properly which in turn prevents GRUB from restarting. The exact nature of the failure I'm not sure of, but the results are pretty apparent in this case.

If booting with the original Multiboot specification the same thing applies, although the Multiboot header only has to be aligned on a Double Word boundary (32-bit) and be within the first 8192 bytes of the ELF executable.

Michael Petch
  • 46,082
  • 8
  • 107
  • 198