0

I've got a small x86-64 kernel I boot with UEFI. At first, I was loading the kernel directly at 0x400000 but then I wanted to have a higher half kernel because I feel it is the proper way to do things. I decided to adopt a strategy where I separate the kernel in 2 parts. One is called startup.elf (startup code) and one is called kernel.elf (the main kernel). The startup code is supposed to set up an identity paging for the first 4MB and the paging structures for the higher half kernel. I plan on mapping 0x8000_0000_0000 to 0 and map all RAM from there.

I had code before to set up the GDT and a far return to set CS (that I took here:Change GDT and update CS while in long mode). I also had code to set up paging. Everything was working quite fine and I was working on setting up the xHC. It was triggering interrupts as expected.

Now my code doesn't work and it triple faults. I don't know exactly why but I think it has something to do with paging. I provide a minimal example which is quite short. It is booted with UEFI and for sake of brevity I won't share the boot code here. I know the jump to the code works since I can halt the processor in the way of the code.

typedef unsigned char UINT8;
typedef unsigned short UINT16;
typedef unsigned int UINT32;
typedef unsigned long UINT64;

struct GDT{
    UINT64 nullDescriptor;
    
    UINT16 codeLimit;
    UINT16 codeBaseLow;
    UINT8 codeBaseMid;
    UINT8 codeFlags;
    UINT8 codeLimitMid;
    UINT8 codeBaseHigh;
    
    UINT16 dataLimit;
    UINT16 dataBaseLow;
    UINT8 dataBaseMid;
    UINT8 dataFlags;
    UINT8 dataLimitMid;
    UINT8 dataBaseHigh;
}__attribute__((packed));

struct GDTR{
    UINT16 size;
    GDT* address;
}__attribute__((packed));

void main(){
    //Identity mapping
    UINT64* pml4Ptr = (UINT64*)0x200000;
    *pml4Ptr = 0x20101b;
    
    UINT64* pdpPtr = (UINT64*)0x201000;
    *pdpPtr = 0x20201b;
    
    UINT64* pdPtr = (UINT64*)0x202000;
    *pdPtr = 0x20301b;
    *(pdPtr + 1) = 0x20401b;
    
    UINT64* ptPtr = (UINT64*)0x203000;
    UINT64 physAddr = 0x1b;
    for (UINT32 i = 0; i < 2 * 512; i++){
        *(ptPtr + i) = physAddr;
        physAddr += 0x1000;
    }

    asm volatile(
    "movq $0x200018, %rax\n\t"
    "mov %rax, %cr3\n\t"
    );
        
    GDT gdt = {
        .nullDescriptor = 0,
        
        .codeLimit = 0x0000,
        .codeBaseLow = 0,
        .codeBaseMid = 0,
        .codeFlags = 0x9a,
        .codeLimitMid = 0xaf,
        .codeBaseHigh = 0,
        
        .dataLimit = 0x0000,
        .dataBaseLow = 0,
        .dataBaseMid = 0,
        .dataFlags = 0x92,
        .dataLimitMid = 0x00,
        .dataBaseHigh = 0
    };
    
    GDT* gdtAddr = &gdt;
    GDTR gdtr = { 23, gdtAddr };
    GDTR* gdtrAddr = &gdtr;
    
    asm volatile("lgdt (%0)" : : "r"(gdtrAddr));
    
    asm volatile(
    "movq $0x400000, %rsp\n\t"
    "sub $16, %rsp\n\t"
    "movq $8, 8(%rsp)\n\t"
    "movabsq $fun, %rax\n\t"
    "mov %rax, (%rsp)\n\t"
    "lretq\n\t"
    "fun:\n\t"
    "movq $0x10, %rax\n\t"
    "mov %ax, %ss\n\t"
    "mov %ax, %es\n\t"
    "mov %ax, %ds\n\t"
    "mov %ax, %gs\n\t"
    "mov %ax, %fs\n\t"
    "hlt"
    );
}

The code triple faults and I don't know why. It was working before. I tried putting a hlt instruction just after loading the CR3 register and it halts properly. Whenever I try to write a value to memory, it triple faults (probably a page fault or something). I don't see why. I think I need a second pair of eyes on this one.

I print the virtual memory mapping using monitor info mem and it outputs the right thing (I test the code on QEMU and debug with GDB). I also tried monitor info tlb and it seems to be correct. Here's the outputs:

(gdb) monitor info mem
0000000000000000-0000000000400000 0000000000400000 -rw
(gdb) monitor info tlb
0000000000000000: 0000000000000000 -----CT-W
0000000000001000: 0000000000001000 -----CT-W
0000000000002000: 0000000000002000 -----CT-W
0000000000003000: 0000000000003000 -----CT-W
0000000000004000: 0000000000004000 -----CT-W
0000000000005000: 0000000000005000 -----CT-W
0000000000006000: 0000000000006000 -----CT-W
0000000000007000: 0000000000007000 -----CT-W
0000000000008000: 0000000000008000 -----CT-W
0000000000009000: 0000000000009000 -----CT-W
000000000000a000: 000000000000a000 -----CT-W
000000000000b000: 000000000000b000 -----CT-W
000000000000c000: 000000000000c000 -----CT-W
000000000000d000: 000000000000d000 -----CT-W
000000000000e000: 000000000000e000 -----CT-W
000000000000f000: 000000000000f000 -----CT-W
0000000000010000: 0000000000010000 -----CT-W
0000000000011000: 0000000000011000 -----CT-W
0000000000012000: 0000000000012000 -----CT-W
0000000000013000: 0000000000013000 -----CT-W
0000000000014000: 0000000000014000 -----CT-W
0000000000015000: 0000000000015000 -----CT-W
0000000000016000: 0000000000016000 -----CT-W
0000000000017000: 0000000000017000 -----CT-W
0000000000018000: 0000000000018000 -----CT-W
0000000000019000: 0000000000019000 -----CT-W
000000000001a000: 000000000001a000 -----CT-W
000000000001b000: 000000000001b000 -----CT-W
000000000001c000: 000000000001c000 -----CT-W
000000000001d000: 000000000001d000 -----CT-W
000000000001e000: 000000000001e000 -----CT-W
000000000001f000: 000000000001f000 -----CT-W
0000000000020000: 0000000000020000 -----CT-W
0000000000021000: 0000000000021000 -----CT-W
0000000000022000: 0000000000022000 -----CT-W
0000000000023000: 0000000000023000 -----CT-W
0000000000024000: 0000000000024000 -----CT-W
0000000000025000: 0000000000025000 -----CT-W
0000000000026000: 0000000000026000 -----CT-W
0000000000027000: 0000000000027000 -----CT-W
0000000000028000: 0000000000028000 -----CT-W
0000000000029000: 0000000000029000 -----CT-W
000000000002a000: 000000000002a000 -----CT-W
000000000002b000: 000000000002b000 -----CT-W
000000000002c000: 000000000002c000 -----CT-W
000000000002d000: 000000000002d000 -----CT-W
000000000002e000: 000000000002e000 -----CT-W
000000000002f000: 000000000002f000 -----CT-W
0000000000030000: 0000000000030000 -----CT-W
0000000000031000: 0000000000031000 -----CT-W
0000000000032000: 0000000000032000 -----CT-W
0000000000033000: 0000000000033000 -----CT-W
0000000000034000: 0000000000034000 -----CT-W
0000000000035000: 0000000000035000 -----CT-W
0000000000036000: 0000000000036000 -----CT-W
0000000000037000: 0000000000037000 -----CT-W
0000000000038000: 0000000000038000 -----CT-W
0000000000039000: 0000000000039000 -----CT-W
000000000003a000: 000000000003a000 -----CT-W
000000000003b000: 000000000003b000 -----CT-W
000000000003c000: 000000000003c000 -----CT-W
000000000003d000: 000000000003d000 -----CT-W
000000000003e000: 000000000003e000 -----CT-W
000000000003f000: 000000000003f000 -----CT-W
0000000000040000: 0000000000040000 -----CT-W
0000000000041000: 0000000000041000 -----CT-W
0000000000042000: 0000000000042000 -----CT-W
0000000000043000: 0000000000043000 -----CT-W
0000000000044000: 0000000000044000 -----CT-W
0000000000045000: 0000000000045000 -----CT-W
0000000000046000: 0000000000046000 -----CT-W
0000000000047000: 0000000000047000 -----CT-W
0000000000048000: 0000000000048000 -----CT-W
0000000000049000: 0000000000049000 -----CT-W
000000000004a000: 000000000004a000 -----CT-W
000000000004b000: 000000000004b000 -----CT-W
000000000004c000: 000000000004c000 -----CT-W
000000000004d000: 000000000004d000 -----CT-W
000000000004e000: 000000000004e000 -----CT-W
000000000004f000: 000000000004f000 -----CT-W
0000000000050000: 0000000000050000 -----CT-W
0000000000051000: 0000000000051000 -----CT-W
0000000000052000: 0000000000052000 -----CT-W
0000000000053000: 0000000000053000 -----CT-W
0000000000054000: 0000000000054000 -----CT-W
0000000000055000: 0000000000055000 -----CT-W
0000000000056000: 0000000000056000 -----CT-W
0000000000057000: 0000000000057000 -----CT-W
0000000000058000: 0000000000058000 -----CT-W
0000000000059000: 0000000000059000 -----CT-W
000000000005a000: 000000000005a000 -----CT-W
000000000005b000: 000000000005b000 -----CT-W
000000000005c000: 000000000005c000 -----CT-W
000000000005d000: 000000000005d000 -----CT-W
000000000005e000: 000000000005e000 -----CT-W
000000000005f000: 000000000005f000 -----CT-W
0000000000060000: 0000000000060000 -----CT-W
0000000000061000: 0000000000061000 -----CT-W
0000000000062000: 0000000000062000 -----CT-W
0000000000063000: 0000000000063000 -----CT-W
0000000000064000: 0000000000064000 -----CT-W
0000000000065000: 0000000000065000 -----CT-W
0000000000066000: 0000000000066000 -----CT-W
0000000000067000: 0000000000067000 -----CT-W
0000000000068000: 0000000000068000 -----CT-W
0000000000069000: 0000000000069000 -----CT-W
000000000006a000: 000000000006a000 -----CT-W
000000000006b000: 000000000006b000 -----CT-W
000000000006c000: 000000000006c000 -----CT-W
000000000006d000: 000000000006d000 -----CT-W
000000000006e000: 000000000006e000 -----CT-W
000000000006f000: 000000000006f000 -----CT-W
0000000000070000: 0000000000070000 -----CT-W
0000000000071000: 0000000000071000 -----CT-W
0000000000072000: 0000000000072000 -----CT-W
0000000000073000: 0000000000073000 -----CT-W
0000000000074000: 0000000000074000 -----CT-W
0000000000075000: 0000000000075000 -----CT-W
0000000000076000: 0000000000076000 -----CT-W
0000000000077000: 0000000000077000 -----CT-W
0000000000078000: 0000000000078000 -----CT-W
0000000000079000: 0000000000079000 -----CT-W
000000000007a000: 000000000007a000 -----CT-W
000000000007b000: 000000000007b000 -----CT-W
000000000007c000: 000000000007c000 -----CT-W
000000000007d000: 000000000007d000 -----CT-W
000000000007e000: 000000000007e000 -----CT-W
000000000007f000: 000000000007f000 -----CT-W
0000000000080000: 0000000000080000 -----CT-W
0000000000081000: 0000000000081000 -----CT-W
0000000000082000: 0000000000082000 -----CT-W
0000000000083000: 0000000000083000 -----CT-W
0000000000084000: 0000000000084000 -----CT-W
0000000000085000: 0000000000085000 -----CT-W
0000000000086000: 0000000000086000 -----CT-W
0000000000087000: 0000000000087000 -----CT-W
0000000000088000: 0000000000088000 -----CT-W
0000000000089000: 0000000000089000 -----CT-W
000000000008a000: 000000000008a000 -----CT-W
000000000008b000: 000000000008b000 -----CT-W
000000000008c000: 000000000008c000 -----CT-W
000000000008d000: 000000000008d000 -----CT-W
000000000008e000: 000000000008e000 -----CT-W
000000000008f000: 000000000008f000 -----CT-W
0000000000090000: 0000000000090000 -----CT-W
0000000000091000: 0000000000091000 -----CT-W
0000000000092000: 0000000000092000 -----CT-W
0000000000093000: 0000000000093000 -----CT-W
0000000000094000: 0000000000094000 -----CT-W
0000000000095000: 0000000000095000 -----CT-W
0000000000096000: 0000000000096000 -----CT-W
0000000000097000: 0000000000097000 -----CT-W
0000000000098000: 0000000000098000 -----CT-W
0000000000099000: 0000000000099000 -----CT-W
000000000009a000: 000000000009a000 -----CT-W
000000000009b000: 000000000009b000 -----CT-W
000000000009c000: 000000000009c000 -----CT-W
000000000009d000: 000000000009d000 -----CT-W
000000000009e000: 000000000009e000 -----CT-W
000000000009f000: 000000000009f000 -----CT-W
00000000000a0000: 00000000000a0000 -----CT-W
00000000000a1000: 00000000000a1000 -----CT-W
00000000000a2000: 00000000000a2000 -----CT-W
00000000000a3000: 00000000000a3000 -----CT-W
00000000000a4000: 00000000000a4000 -----CT-W
00000000000a5000: 00000000000a5000 -----CT-W
00000000000a6000: 00000000000a6000 -----CT-W
00000000000a7000: 00000000000a7000 -----CT-W
00000000000a8000: 00000000000a8000 -----CT-W
00000000000a9000: 00000000000a9000 -----CT-W
00000000000aa000: 00000000000aa000 -----CT-W
00000000000ab000: 00000000000ab000 -----CT-W
00000000000ac000: 00000000000ac000 -----CT-W
00000000000ad000: 00000000000ad000 -----CT-W
00000000000ae000: 00000000000ae000 -----CT-W
00000000000af000: 00000000000af000 -----CT-W
00000000000b0000: 00000000000b0000 -----CT-W
00000000000b1000: 00000000000b1000 -----CT-W
00000000000b2000: 00000000000b2000 -----CT-W
00000000000b3000: 00000000000b3000 -----CT-W
00000000000b4000: 00000000000b4000 -----CT-W
00000000000b5000: 00000000000b5000 -----CT-W
00000000000b6000: 00000000000b6000 -----CT-W
00000000000b7000: 00000000000b7000 -----CT-W
00000000000b8000: 00000000000b8000 -----CT-W
00000000000b9000: 00000000000b9000 -----CT-W
00000000000ba000: 00000000000ba000 -----CT-W
00000000000bb000: 00000000000bb000 -----CT-W
00000000000bc000: 00000000000bc000 -----CT-W
00000000000bd000: 00000000000bd000 -----CT-W
00000000000be000: 00000000000be000 -----CT-W
00000000000bf000: 00000000000bf000 -----CT-W
00000000000c0000: 00000000000c0000 -----CT-W
00000000000c1000: 00000000000c1000 -----CT-W
00000000000c2000: 00000000000c2000 -----CT-W
00000000000c3000: 00000000000c3000 -----CT-W
00000000000c4000: 00000000000c4000 -----CT-W
00000000000c5000: 00000000000c5000 -----CT-W
00000000000c6000: 00000000000c6000 -----CT-W
00000000000c7000: 00000000000c7000 -----CT-W
00000000000c8000: 00000000000c8000 -----CT-W
00000000000c9000: 00000000000c9000 -----CT-W
00000000000ca000: 00000000000ca000 -----CT-W
00000000000cb000: 00000000000cb000 -----CT-W
...
etc

Any idea on what is wrong?

Also, I compile the code with this script:

g++ -static -ffreestanding -nostdlib -mgeneral-regs-only -mno-red-zone -c -m64 Startup/Source/Main.cpp -oStartup/Object/Main.o
ld -entry main --oformat elf64-x86-64 --no-dynamic-linker -static -nostdlib -Ttext-segment=300000 Startup/Object/Main.o -ostartup.elf

EDIT

I think the culprit is that g++ allocates variables on the stack. When I quit UEFI the stack pointer points above 4MB. I then set CR3 and when the stack is used it triple faults after the page fault wasn't handled.

How to fix this issue?

user123
  • 2,510
  • 2
  • 6
  • 20
  • An emulator / simulator should be able to log double / triple faults for you, and hopefully the original fault that started the chain. I think BOCHS can, anyway. (As well as BOCHS' built-in debugger letting you single-step the guest machine, as well as dump page tables GDT and so on.) – Peter Cordes Aug 06 '21 at 13:30
  • QEMU and GDB can do all that also. I was using QEMU because I think Bochs doesn't emulate UEFI environment. I may as well be wrong. – user123 Aug 06 '21 at 19:02
  • Ah, right, I saw the question title again later and forgot it was about UEFI. I don't know if BOCHS supports that or not; the fact that its debugger knows about real-mode segmentation will be irrelevant here. – Peter Cordes Aug 07 '21 at 00:23

2 Answers2

1

UEFI doesn't necessarily load your execute exactly where it is linked to, thus your kernel is possibly somewhere else entirely in memory and is thus unmapped when you load cr3. Use symbols within your executable for the memory that needs to be mapped instead of constants.

brenden
  • 574
  • 3
  • 16
  • I may use symbols later as you mentionned but this is not the problem. The UEFI app places the elf kernel at the right position in RAM and RAM is identity mapped when it quits UEFI. This is not the problem either. – user123 Aug 05 '21 at 21:35
  • Again this is not the problem. I read the file from my UEFI bootloader and then map the segments at the right position. The code is really at 0x300000 in RAM. I can look at it easily. – user123 Aug 05 '21 at 21:45
  • If it is the stack like you are saying, then you could either use an assembly stub to try creating a new stack within your 4m identity mapped region (maybe in the bss section). Alternatively, you could also just map the region where the stack is as well. – brenden Aug 06 '21 at 01:35
  • Yes that is why the code worked before in my 0x400000 kernel. I was mapping the whole address space so it included the stack. I may try to place the stack at 0x400000 to grow downward. – user123 Aug 06 '21 at 03:25
0

In the end, several things were wrong with the code.

First of all, I had to use -fomit-frame-pointer in the g++ compilation command to avoid the code to use the frame pointer (rbp). I found a good article about the frame pointer here: https://people.cs.rutgers.edu/~pxk/419/notes/frames.html. This article states the following:

Here’s what happens during function (there might be slight differences among languages/architectures)

  1. Push the current value of the frame pointer (ebp/rbp). This saves it so we can restore it later.

  2. Move the current stack pointer to the frame pointer. This defines the start of the frame.

  3. Subtract the space needed for the function’s data from the stack pointer. Remember that stacks grow from high memory to low memory. This puts the stack pointer past the space that will be used by the function so that anything pushed onto the stack now will not overwrite useful values.

  4. Now execute the code for the function. References to local variables will be negative offsets to the frame pointer (e.g., "movl $123, –8(%rbp)”).

  5. On exit from the function, copy the value from the frame pointer to the stack pointer (this clears up the space allocated to the stack frame for the function) and pop the old frame pointer. This is accomplished by the “leave” instruction.

  6. Return from the procedure via a “ret” instruction. This pops the return value from the stack and transfers execution to that address.

The key point here is point 4. Without -fomit-frame-pointer the function will reference local variables with a negative offset from the frame pointer. When I set a new stack, the function will still use the old RBP value that was close to the old stack. The function will have a certain amount of stack allocated to it during compilation. Then RSP will be copied to RBP and everything in the function is just relative offsets from RBP. Then RSP will be decremented to point just below that allocated space. When I modify the stack, the RBP value doesn't get modified. One solution would be to modify it but I think a better approach is to just not use RBP and to omit the frame pointer completely.

Also, I first placed the stack at 0x400000. This is also wrong. It doesn't seem to be because the stack grows downward. It is wrong because, when you omit the frame pointer, the function still decrements RSP of the allocated stack space for the function. Then instead of using negative offsets from RBP it will use positive offsets from RSP. When I first enter the function, RSP is decremented then, if I set the stack pointer to 0x400000, a page fault will occur because the offsets for the function's data will be positive offsets from 0x400000. Since I map only 4MB, then I end up above the mapped region and it triggers a page fault which boils down to a triple fault without proper handling. Instead, I should place the stack pointer lower which should work properly.

The code which works is the following:

typedef unsigned char UINT8;
typedef unsigned short UINT16;
typedef unsigned int UINT32;
typedef unsigned long UINT64;

struct GDT{
    UINT64 nullDescriptor;
    
    UINT16 codeLimit;
    UINT16 codeBaseLow;
    UINT8 codeBaseMid;
    UINT8 codeFlags;
    UINT8 codeLimitMid;
    UINT8 codeBaseHigh;
    
    UINT16 dataLimit;
    UINT16 dataBaseLow;
    UINT8 dataBaseMid;
    UINT8 dataFlags;
    UINT8 dataLimitMid;
    UINT8 dataBaseHigh;
}__attribute__((packed));

struct GDTR{
    UINT16 size;
    GDT* address;
}__attribute__((packed));

void main(){
    //Identity mapping
    UINT64* pml4Ptr = (UINT64*)0x200000;
    *pml4Ptr = 0x20101b;
    
    UINT64* pdpPtr = (UINT64*)0x201000;
    *pdpPtr = 0x20201b;
    
    UINT64* pdPtr = (UINT64*)0x202000;
    *pdPtr = 0x20301b;
    *(pdPtr + 1) = 0x20401b;
    
    UINT64* ptPtr = (UINT64*)0x203000;
    UINT64 physAddr = 0x1b;
    for (UINT32 i = 0; i < 2 * 512; i++){
        *(ptPtr + i) = physAddr;
        physAddr += 0x1000;
    }

    asm volatile(
    "movq $0x200018, %rax\n\t"
    "mov %rax, %cr3\n\t"
    "movq $0x350000, %rsp"
    );
        
    GDT gdt = {
        .nullDescriptor = 0,
        
        .codeLimit = 0x0000,
        .codeBaseLow = 0,
        .codeBaseMid = 0,
        .codeFlags = 0x9a,
        .codeLimitMid = 0xaf,
        .codeBaseHigh = 0,
        
        .dataLimit = 0x0000,
        .dataBaseLow = 0,
        .dataBaseMid = 0,
        .dataFlags = 0x92,
        .dataLimitMid = 0x00,
        .dataBaseHigh = 0
    };
    
    GDT* gdtAddr = &gdt;
    GDTR gdtr = { 23, gdtAddr };
    GDTR* gdtrAddr = &gdtr;
    
    asm volatile("lgdt (%0)" : : "r"(gdtrAddr));
    
    asm volatile(
    "sub $16, %rsp\n\t"
    "movq $8, 8(%rsp)\n\t"
    "movabsq $fun, %rax\n\t"
    "mov %rax, (%rsp)\n\t"
    "lretq\n\t"
    "fun:\n\t"
    "movq $0x10, %rax\n\t"
    "mov %ax, %ss\n\t"
    "mov %ax, %es\n\t"
    "mov %ax, %ds\n\t"
    "mov %ax, %gs\n\t"
    "mov %ax, %fs\n\t"
    "hlt"
    );
}
user123
  • 2,510
  • 2
  • 6
  • 20
  • If your code depends on the compiler not making a prologue/epilogue that sets up/tears down RBP, almost always that means you're doing something wrong. For example, GNU C basic asm like `asm volatile( "movq $0x200018, %rax\n\t" "mov %rax, %cr3\n\t" );` (with no clobber declaration on RAX which you destroy) is undefined behaviour in a function that isn't `__attribute__((naked))`. If you want to define functions in inline asm, do it at global scope, not inside `main`. – Peter Cordes Aug 06 '21 at 06:10
  • Also, use `lea fun(%rip), %rax` instead of using a 64-bit absolute address, to make your code position-independent. – Peter Cordes Aug 06 '21 at 06:11
  • So the main function should call naked functions defined in assembly? Does that mean I need to call ```ret``` myself or the naked function calls it? What about the RBP register being outside the mapped region. I can't fix that without disabling RBP unless I change RBP in assembly which seems wrong. – user123 Aug 06 '21 at 19:08
  • As to LEA I'll check that out. Thanks for suggestions anyway. – user123 Aug 06 '21 at 19:13
  • Re: RIP-relative LEA, see [How to load address of function or label into register](https://stackoverflow.com/q/57212012) – Peter Cordes Aug 07 '21 at 00:17
  • 1
    Inside a normal function (not `naked`) like your `main`, use `asm("..." : "=r"(var) : "eax", "ecx")` or whatever. If you want to define a whole function in asm, do it at global scope, a separate file, or with a `naked` function whose body is only a Basic `asm` statement. Call it from pure C by declaring a prototype and writing `foo();` as a C statement. Yes, pure-asm functions need to end with `ret`. (You don't "call" `ret`, you execute it.) – Peter Cordes Aug 07 '21 at 00:20