4

I'm working on a simple kernel and I've been trying to implement a keyboard interrupt handler to get rid of port polling. I've been using QEMU in -kernel mode (to reduce compile time, because generating the iso using grub-mkrescue takes quite some time) and it worked just fine, but when I wanted to switch to -cdrom mode it suddenly started crashing. I had no idea why.

Eventually I've realized that when it boots from an iso it also runs a GRUB bootloader before booting the kernel itself. I've figured out GRUB probably switches the processor into protected mode and that causes the problem.

the problem: Normally I'd simply initialize the interrupt handler and whenever I'd press a key it would be handled. However when I run my kernel using an iso and pressed a key the virtual machine simply crashed. This happened in both qemu and VMWare so I assume there must be something wrong with my interrupts.

Bear in mind that the code work just fine for as long as I don't use GRUB. interrupts_init()(see below) is one of the first things called in the main() kernel function.

Essentially the question is: Is there a way to make this work in protected mode?.

A complete copy of my kernel can be found in my GitHub repository. Some relevant files:

lowlevel.asm:

section .text

global keyboard_handler_int
global load_idt

extern keyboard_handler

keyboard_handler_int:
    pushad
    cld
    call keyboard_handler
    popad
    iretd

load_idt:
    mov edx, [esp + 4]
    lidt [edx]
    sti
    ret

interrupts.c:

#include <assembly.h> // defines inb() and outb()

#define IDT_SIZE 256
#define PIC_1_CTRL 0x20
#define PIC_2_CTRL 0xA0
#define PIC_1_DATA 0x21
#define PIC_2_DATA 0xA1

extern void keyboard_handler_int(void);
extern void load_idt(void*);

struct idt_entry
{
    unsigned short int offset_lowerbits;
    unsigned short int selector;
    unsigned char zero;
    unsigned char flags;
    unsigned short int offset_higherbits;
} __attribute__((packed));

struct idt_pointer
{
    unsigned short limit;
    unsigned int base;
} __attribute__((packed));

struct idt_entry idt_table[IDT_SIZE];
struct idt_pointer idt_ptr;

void load_idt_entry(int isr_number, unsigned long base, short int selector, unsigned char flags)
{
    idt_table[isr_number].offset_lowerbits = base & 0xFFFF;
    idt_table[isr_number].offset_higherbits = (base >> 16) & 0xFFFF;
    idt_table[isr_number].selector = selector;
    idt_table[isr_number].flags = flags;
    idt_table[isr_number].zero = 0;
}

static void initialize_idt_pointer()
{
    idt_ptr.limit = (sizeof(struct idt_entry) * IDT_SIZE) - 1;
    idt_ptr.base = (unsigned int)&idt_table;
}

static void initialize_pic()
{
    /* ICW1 - begin initialization */
    outb(PIC_1_CTRL, 0x11);
    outb(PIC_2_CTRL, 0x11);

    /* ICW2 - remap offset address of idt_table */
    /*
    * In x86 protected mode, we have to remap the PICs beyond 0x20 because
    * Intel have designated the first 32 interrupts as "reserved" for cpu exceptions
    */
    outb(PIC_1_DATA, 0x20);
    outb(PIC_2_DATA, 0x28);

    /* ICW3 - setup cascading */
    outb(PIC_1_DATA, 0x00);
    outb(PIC_2_DATA, 0x00);

    /* ICW4 - environment info */
    outb(PIC_1_DATA, 0x01);
    outb(PIC_2_DATA, 0x01);
    /* Initialization finished */

    /* mask interrupts */
    outb(0x21 , 0xFF);
    outb(0xA1 , 0xFF);
}

void idt_init(void)
{
    initialize_pic();
    initialize_idt_pointer();
    load_idt(&idt_ptr);
}

void interrupts_init(void)
{
    idt_init();
    load_idt_entry(0x21, (unsigned long) keyboard_handler_int, 0x08, 0x8E);

    /* 0xFD is 11111101 - enables only IRQ1 (keyboard)*/
    outb(0x21 , 0xFD);
}

kernel.c

#if defined(__linux__)
    #error "You are not using a cross-compiler, you will most certainly run into trouble!"
#endif

#if !defined(__i386__)
    #error "This kernel needs to be compiled with a ix86-elf compiler!"
#endif

#include <kernel.h>

// These _init() functions are not in their respective headers because
// they're supposed to be never called from anywhere else than from here

void term_init(void);
void mem_init(void);
void dev_init(void);

void interrupts_init(void);
void shell_init(void);

void kernel_main(void)
{
    // Initialize basic components
    term_init();
    mem_init();
    dev_init();
    interrupts_init();

    // Start the Shell module
    shell_init();

    // This should be unreachable code
    kernel_panic("End of kernel reached!");
}

boot.asm:

bits 32
section .text
;grub bootloader header
        align 4
        dd 0x1BADB002            ;magic
        dd 0x00                  ;flags
        dd - (0x1BADB002 + 0x00) ;checksum. m+f+c should be zero

global start
extern kernel_main

start:
  mov esp, stack_space  ;set stack pointer
  call kernel_main

; We shouldn't get to here, but just in case do an infinite loop
endloop:
  hlt           ;halt the CPU
  jmp endloop

section .bss
resb 8192       ;8KB for stack
stack_space:
Michael Petch
  • 46,082
  • 8
  • 107
  • 198
natiiix
  • 1,005
  • 1
  • 13
  • 21
  • I feel for you. It is in all likelihood a bug in your own software. The issue is that the code you show doesn't really make this a minimal complete verifiable example. Its nearly impossible to say (we don't even see the keyboard handler, how you set up the GDT/LIDT etc). I'd like to be able to help. If you had this in a Git repository for example I'd be able to test it and see what I can find. I'm assuming there is too much code to publish everything in a single question? – Michael Petch Apr 02 '17 at 05:26
  • You know something looked vaguely familiar about this code until I searched my local files and discovered code that seems awfully familiar. Have you tried the code in this answer: http://stackoverflow.com/a/37635449/3857942 . Your file names seem very similar to the ones in the question I answered last year. My answer includes a minimal complete set of files. I wonder if that code works when you boot from an ISO – Michael Petch Apr 02 '17 at 05:41
  • Yes, GRUB will switch into protected mode when booting multibbot compliant kernel. If you ever relaod any of the selector registers (CS, DS, ES) with a multiboot compliant kernel you need to make sure you set up your own GDT as well. If you don't change them then you are okay. – Michael Petch Apr 02 '17 at 05:47
  • If your code is anything like that answer of mine then I have a suspicion I know what may be wrong and it may even be related to my last comment about the GDT. In the Multiboot spec there is this important note in the Multiboot docs _‘GDTR’ Even though the segment registers are set up as described above, the ‘GDTR’ may be invalid, so the OS image must not load any segment registers (even just reloading the same values!) until it sets up its own ‘GDT’._ . Possibility is failure is happening upon an IRQ firing as the CS selector gets loaded when an interrupt occurs. Bad GDTR could cause failure – Michael Petch Apr 02 '17 at 06:50
  • It is 1:30am here. I was able to reproduce what is probably your problem. My best guess is you need to manually setup your own GDT before you setup the IDT. It appears the _QEMU_ multiboot loader (used with `-kernel`) is more forgiving about the _GDTR/GDT_ . I was able to hack in a _GDT_ and it seemed to resolve the issue. Later in the day I'll update my answer at the other link. – Michael Petch Apr 02 '17 at 07:29
  • @MichaelPetch This IS the code from that answer, I forgot to mention it. – natiiix Apr 02 '17 at 08:16
  • @MichaelPetch Yes, the code is on GitHub, but I assumed it wouldn't be particularly helpful so I've decided not to post it. Anyways, it's [link](https://github.com/natiiix/KokOS), the important part is in /src/drivers/io/keyboard.c and /src/drivers/interrupts.c, all Assembly code is in /src/core/. I'll push the repo ASAP, it's currently slightly outdated. – natiiix Apr 02 '17 at 08:20
  • I sent you a pull request on your Github repo with a [change](https://github.com/natiiix/KokOS/pull/1/commits/83c17e6a3546cb46748b1941dc89fc3c7cc84056) that I hope resolves your issue. It sets up a GDT with basic 4gb flat descriptors and calls the `load_gdt` function from your `kernel_main` function – Michael Petch Apr 02 '17 at 15:23

1 Answers1

5

I had a hunch last night as to why loading through GRUB and loading through the Multiboot -kernel feature of QEMU might not work as expected. That is captured in the comments. I have managed to confirm the findings based on more of the source code being released by the OP.

In the Mulitboot Specification there is a note about the GDTR and the GDT with regards to modifying selectors that is relevant:

GDTR

Even though the segment registers are set up as described above, the ‘GDTR’ may be invalid, so the OS image must not load any segment registers (even just reloading the same values!) until it sets up its own ‘GDT’.

An interrupt routine could alter the CS selector causing issues.

There is another concern and most likely the root cause of problems. The Multiboot specification also states this about the selectors it creates in its GDT:

‘CS’
Must be a 32-bit read/execute code segment with an offset of ‘0’ and a
limit of ‘0xFFFFFFFF’. The exact value is undefined. 
‘DS’
‘ES’
‘FS’
‘GS’
‘SS’
Must be a 32-bit read/write data segment with an offset of ‘0’ and a limit
of ‘0xFFFFFFFF’. The exact values are all undefined. 

Although it says what types of descriptors will be set up it doesn't actually specify that a descriptor has to have a particular index. One Multiboot loader may have a Code segment descriptor at index 0x08 and another bootloader may use 0x10. This is of particular relevance when you look at one line of your code:

load_idt_entry(0x21, (unsigned long) keyboard_handler_int, 0x08, 0x8E);

This creates an IDT descriptor for interrupt 0x21. The third parameter 0x08 is the Code selector the CPU needs to use to access the interrupt handler. I discovered this works on QEMU where the code selector is 0x08, but in GRUB it appears to be 0x10. In GRUB the 0x10 selector points at a non-executable Data segment and this will not work.

To get around all these problems the best thing to do is set up your own GDT shortly after starting up your kernel and before setting up an IDT and enabling interrupts. There is a tutorial on the GDT in the OSDev Wiki if you want more information.

To set up a GDT I'll simply create an assembler routine in lowlevel.asm to do it by adding a load_gdt function and data structures:

global load_gdt

; GDT with a NULL Descriptor, a 32-Bit code Descriptor
; and a 32-bit Data Descriptor
gdt_start:
gdt_null:
    dd 0x0
    dd 0x0

gdt_code:
    dw 0xffff
    dw 0x0
    db 0x0
    db 10011010b
    db 11001111b
    db 0x0

gdt_data:
    dw 0xffff
    dw 0x0
    db 0x0
    db 10010010b
    db 11001111b
    db 0x0
gdt_end:

; GDT descriptor record
gdt_descriptor:
    dw gdt_end - gdt_start - 1
    dd gdt_start

CODE_SEG equ gdt_code - gdt_start
DATA_SEG equ gdt_data - gdt_start

; Load GDT and set selectors for a flat memory model
load_gdt:
    lgdt [gdt_descriptor]
    jmp CODE_SEG:.setcs              ; Set CS selector with far JMP
.setcs:
    mov eax, DATA_SEG                ; Set the Data selectors to defaults
    mov ds, eax
    mov es, eax
    mov fs, eax
    mov gs, eax
    mov ss, eax
    ret

This creates and loads a GDT that has a NULL Descriptor at index 0x00, a 32-bit code descriptor at 0x08, and a 32-bit data descriptor at 0x10. Since we are using 0x08 as the code selector this matches what you specify as a code selector in your IDT entry initialization for interrupt 0x21:

load_idt_entry(0x21, (unsigned long) keyboard_handler_int, 0x08, 0x8E);

The only other thing is that you'll need to amend your kernel.c to call load_gdt. One can do that with something like:

extern void load_gdt(void);

void kernel_main(void)
{
    // Initialize basic components
    load_gdt();
    term_init();
    mem_init();
    dev_init();
    interrupts_init();

    // Start the Shell module
    shell_init();

    // This should be unreachable code
    kernel_panic("End of kernel reached!");
}
Michael Petch
  • 46,082
  • 8
  • 107
  • 198
  • Thank you very much. Your answer came unexpectedly fast and it perfectly explained the problem. However I've noticed a strange phenomenon. In both QEMU and VMWare the VGA cursor disappeared when booting your modified code. I have confirmed that this doesn't happen in QEMU in `-kernel` mode. – natiiix Apr 02 '17 at 20:24
  • I'd have to look at the cursor issue a bit later when I have a bit more time. Likely my code isn't the cause, but probably something related with how GRUb sets up the graphics mode (I am just guessing at this point until I have time to look) – Michael Petch Apr 02 '17 at 20:28
  • It indeed seems that this happens even when I comment out all the new code. I don't remember this happening before, but it's possible that I simply haven't noticed it somehow. GRUB sure brings in a bunch of mysteries for me. – natiiix Apr 02 '17 at 20:45
  • @user3043260 : Until I find a good solution, you can probably get the cursor to show by changing `grub.cfg` by setting the the `timeout` value from 0 to 1. This is a quick hack – Michael Petch Apr 02 '17 at 22:15
  • I've added that only very recently, so you think that might be the problem? It would explain why I haven't noticed for sure. Thanks for your time, man. I appreciate it tons! – natiiix Apr 03 '17 at 05:48
  • The hack actually makes the menu show up (timeout of 0 doesn't make it appear). By making the menu appear GRUB enables the cursor and sets the cursor to be an underline type character. The way to fix this properly is to enable the cursor (in your kernel terminal initialization) through the video hardware ports and then tell it which scanlines in the character where the cursor will appear. Tomorrow when I have a chance I'll post some code. The problem with no cursor appears when going through GRUB. – Michael Petch Apr 03 '17 at 05:50
  • I know what it does, that's why I've added it. Because I found it ridiculous to have the menu there with a single option. I guess in QEMU `-kernel` mode it happens because of QEMU's bootloader which also writes some text on the screen and displays the cursor. I'll try to look up the code somewhere. – natiiix Apr 03 '17 at 11:27
  • I thought I'd use code from [this answer](http://stackoverflow.com/a/25322075/3043260) but it seems to disable the cursor instead of enabling it. That means that even in `-kernel` mode it can't be seen now. – natiiix Apr 03 '17 at 11:48
  • Never mind, figured it out. It was easier to do then I expected. `outb(0x3D4, 0x0A); outb(0x3D5, 0xE & 0xEF); outb(0x3D4, 0x0B); outb(0x3D5, 0xF);` Thanks for your help nonetheless! :) – natiiix Apr 03 '17 at 12:09
  • @user3043260 I actually pushed a cursor mode change to your repo last night. You must have missed it. – Michael Petch Apr 03 '17 at 15:48
  • Oh, I had no idea, because I don't check GitHub normally. I usually perform all actions from my IDE. Thanks for that. I wish GitHub would notify me by an email when something happens to one of my repositories. – natiiix Apr 03 '17 at 18:01
  • I receive notifications from github when activity occurs associated with my account. Is it possible it is going to spam? Going to wrong emaIl address etc. – Michael Petch Apr 03 '17 at 18:03
  • What's the purpose of rewriting 0x0A with the original value | 0x20 if you rewrite it immediately afterwards with 0x0D? (regarding the email) Oh, aha, sorry about that, I had it disabled for some mysterious reason. – natiiix Apr 03 '17 at 18:06
  • @user3043260 : because as rush to combine two pieces of code I overlooked the fact that I clobbered it (albeit still enabling the cursor by setting the CD bit to 00). The start scan line processing should be moved up into the code that turns off the cursor disabled bit. It was done in the middle of the night and was probably to tired to notice lol – Michael Petch Apr 03 '17 at 18:16
  • @user3043260 : I made a [new pull request](https://github.com/natiiix/KokOS/pulls) for my original code that properly implements what I was **trying** to do last night. Sorry about that. – Michael Petch Apr 03 '17 at 18:59
  • Okay, well now it's essentially what I've had a couple of hours ago, but is it really necessary to keep those high bits? According to the VGA documentation I've found, aside from the cursor disable bit in 0xB and 2 bits (5., 6.) in 0xB there's nothing else in those registers. – natiiix Apr 03 '17 at 20:45
  • @user3043260 Some of the documentation (regarding 0x0B) suggests it may matter if you were to be running on old hardware with EGA.Keeping them the same by retaining the previous bit values is an attempt to reduce the possible problems you may encounter. Likely it will never be an issue but I'd rather be safer than sorry. The enable cursor bit is part of the (0x0A) processing.That is separate from the meaning of (0x0B). You'll notice in my final code I didn't care about the upper bits (beside cursor enable) with 0x0A.In my final code I was only worried about keeping the upper bits same for 0x0B – Michael Petch Apr 03 '17 at 20:59
  • Yeah, I've read that, but I thought it wouldn't matter nowadays, but I understand it. I've noticed yet another mysterious behavior. This time I have truly no idea what might be causing it. The cursor ins't blinking. However it doesn't happen in QEMU at all, not even in `-cdrom` mode. This makes no sense, because the blinking shouldn't be possible to disable, I'm really confused. Tested in VirtualBox, same behavior. – natiiix Apr 03 '17 at 21:22
  • @user3043260 : Some non-standard VGA controllers do have support for the ability to change the rate at which the cursor blinks. Some virtual environments won't support a blinking hardware emulated cursor no matter what. I'm a bit confused. Does it blink in QEMU? Which environments doesn't it blink in? Specific to VirtualBox - they don't support a blinking cursor according to this: http://virtualbox.org/ticket/7923 – Michael Petch Apr 03 '17 at 23:58
  • I believe it didn't blink in VMWare Workstation the last time I tried either. But it doesn't hint any problem, does it? – natiiix Apr 04 '17 at 08:41
  • QEMU is blinking just fine in both modes and VMWare doesn't blink, but it also crashes often so I don't know what's up with that. VirtualBox doesn't blink as mentioned above. I'm not brave enough to run it on real hardware now as the VMWare crashing is kinda scary. – natiiix Apr 04 '17 at 09:46
  • @user3043260 Did you see the link to Virtualbox in my earlier comment. There was a bug report they do not blink because of design intent (not wasting computing resources). The same thing is true for many. Many. Many don't even honor text mode blinking text either. If you test on real hardware much more likely you'll find a blinking cursor. Blinking cursor is just something you can't rely on from a virtual machine so shouldn't be a concern. – Michael Petch Apr 04 '17 at 14:27
  • @user3043260 As for crashes that's not uncommon in OS Development. Part of the fun/pain is debugging them. – Michael Petch Apr 04 '17 at 14:30
  • Yeah, I've read that, but I'm kinda certain that VMWare Workstation in fact supports text mode cursor blinking, but whatever. It's a little worrying that VMWare is the only crashing environment. It works perfectly fine in QEMU in every mode and also in VirtualBox. It doesn't even give me any error, it straight up crashes VMWare and it completely stops responding before the BIOS even shows up. It's quite surprising from a business-grade software. – natiiix Apr 04 '17 at 16:53
  • VMWare is a solid product. If it is crashing it is almost inevitable something in your code. I've used it for many years for OS development – Michael Petch Apr 04 '17 at 16:55
  • It's crashing when booting a reasonable clean and fresh installation of Ubuntu too, so I guess it's more of a compatibility issue or it doesn't like some of my drivers or something along the lines. (I'm Windows 10 Insider running on quite volatile versions of the OS even though it's not particularly wise and even Microsoft has warned me) Usually killing the vmware-vmx.exe process and restarting the Workstation solves the problem. I've had some processor faults going on before and it never crashed the whole machine, it just gave me an error during runtime, but the BIOS always POSTed just fine. – natiiix Apr 04 '17 at 20:29
  • For *months* (yes, months), every time I've even thought of experimenting with writing a small kernel, I get stuck at the GDT. I swear I understand the concept, and have written routines for it in both C and Assembly. What I MISSED, though, was the approach you took. Whereas I had been using runtime computations to make my GDT, I should have just declared it in .data like you. Thanks!!! – Tobe Osakwe Jul 16 '18 at 17:50