3

I am trying to write a very basic kernel for i386 32-bit. I have been following JamesM's kernel tutorials. My kernel code itself currently does very little - where I am struggling is trying to get grub to correctly load the higher-half format I've chosen to work with.

The advice for doing that is described in this answer and I'm working roughly with the higher half bare bones. My test environment is bochs, with a grub floppy (0.97/legacy).

To give you an idea of where I'm up to, I have a very simple do-almost-nothing "main" routine:

[section .text]
align 4

; setting up entry point for linker
start equ (start_vma - 0xBFF00000)

; entry point
start_vma:
    push ebx
    mov  eax, 0xCCCCCCCC
    hlt

This technique gives me:

bjdump -x kernel | grep "start"
start address 0x00100000
c0000000 g       .text   00000000 start_vma
00100000 g       .text   00000000 start

Then what I have done is to use the linker script to set the LMA (physical address load points) to 1MB, but the VMAs to 3GB, like so:

OUTPUT_FORMAT(elf32-i386)
ENTRY(start)
KERNEL_MBOOT = 0x400;
KERNEL_VMA        = 0xC0000000;
KERNEL_LMA_OFFSET = 0xBFF00000;

PHDRS
{
    headers PT_PHDR PHDRS ;
    mboot PT_LOAD FILEHDR ;
    text PT_LOAD FILEHDR ;
    data PT_LOAD ;
}

SECTIONS
{
  . = KERNEL_MBOOT;

  .mboot : AT(KERNEL_VMA - KERNEL_LMA_OFFSET)
  {
     *(.mboot)
  } : mboot

  . = KERNEL_VMA;

  .text : AT(ADDR(.text)+ADDR(.mboot) - KERNEL_LMA_OFFSET)
  {
    code = .; _code = .; __code = .;*/
    *(.text)
    *(.rodata*)
  } : text

  .data ALIGN (0x1000) : AT(ADDR(.data) - KERNEL_LMA_OFFSET)
  {
     data = .; _data = .; __data = .;
     *(.data)
     *(.rodata)
  } : data

  .bss : AT(ADDR(.bss) - KERNEL_LMA_OFFSET)
  {
    bss = .; _bss = .; __bss = .;
    *(.bss)
    *(COMMON)
    ebss = .;
  } : data

  end = .; _end = .; __end = .;
}

Or at least so I believe. Explanation. Certainly, this is what I have in my object:

$ objdump -h kernel.bin
file format elf32-i386

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .mboot        0000000c  00000400  00100000  00000400  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  1 .text         0000002e  c0000000  00100400  00001000  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  2 .data         00000001  c0001000  00101000  00002000  2**12
                  CONTENTS, ALLOC, LOAD, DATA
  3 .bss          00004000  c0001020  00101020  00002001  2**5
                  ALLOC

The .mboot section contains just the multiboot header. Indeed, running mbchk gives:

kernel: The Multiboot header is found at the offset 1024.
kernel: Page alignment is turned on.
kernel: Memory information is turned on.
kernel: Address fields is turned off.
kernel: All checks passed.

So all in all, this looks like a valid, should be booting elf kernel that does absolutely nothing.

However, it actually isn't booting. Instead, grub complains like so:

error 28: selected item cannot fit into memory

I am building with gcc 4.7.2 on x64 fedora, with:

CFLAGS=-nostdlib -fno-builtin -fno-stack-protector -ffreestanding -m32 
LDFLAGS=-melf_i386 -Tlink.ld

Where link.ld is the script I've included above.

My question is, what am I doing wrong in the above to make grub believe that the kernel can fit into memory? I have tried:

  • Altering the VMA_OFFSET fields to make the VMA match the LMA i.e. load everything at 1MB physical and virtual makes no difference, i.e. I get the same error.
  • Bringing the LMA to below 1MB. Causes grub error 7: cannot load below 1MB, as expected.
  • Binge drinking tea and coffee.

None of these things worked. I have the feeling I am missing something quite obvious, but can't work out what it is.


Update: looking at the grub multiboot elf line, here's what I see:

 [Multiboot-elf, <0xffc00:0x40c:0x0>

Crucially, I believe I am supposed to see is something like the line featured in this question, namely it should have an entry= part!

Anyway, I looked at providing an init region:

[section .init]
; setting up entry point for linker

; entry point
start:
    push ebx
    mov  eax, 0xCCCCCCCC
    ; jmp  later
    hlt  

Which I linked like so:

.init : AT(ADDR(.mboot) + (KERNEL_VMA - KERNEL_LMA_OFFSET))
{
    *(.init) 
} : init

To give:

 Idx Name          Size      VMA       LMA       File off  Algn
 1   .init         0000000c  00100000  00100400  00001000  2**0

Which has a VMA/LMA of around 1M, so this should load... but I'm still getting complaints about the selected item not fitting into memory.

Community
  • 1
  • 1
  • I see this sentence in the answer you linked to: "Since the code in the early init section is linked at the same address as where it is loaded, GRUB can jump into this code without any problems." Which I understand to mean that both the physical and virtual addresses are 1MiB. – Adrian Panasiuk Jun 20 '13 at 21:15
  • @AdrianPanasiuk ah, yep. My start symbol in the asm snippet I posted should be available at a lower address. Do I need to separate it into a different section, then? I've edited the symbol addresses into the answer. Will give separate sections a go - I'm a linker script wizard now (sort of!). –  Jun 20 '13 at 21:16
  • What version of grub are you using ? I do not find any trace of your error message in grub-1.99 or in grub-2.00 source code. – Coren Jun 29 '13 at 18:15
  • @Coren grub 0.97, the last "legacy" version. I did think about switching to grub 2 and have tried this on a VM - grub2 reports "overlap" or something similar to that. –  Jun 29 '13 at 18:46

2 Answers2

2

In grub 0.97 source code, this error message came from stage2/common.c:89

[ERR_WONT_FIT] = "Selected item cannot fit into memory",

This code is used a dozen of times:

stage2/char_io.c:1215:    errnum = ERR_WONT_FIT;
stage2/boot.c:276:    errnum = ERR_WONT_FIT;
stage2/boot.c:280:  errnum = ERR_WONT_FIT;
stage2/shared.h:540:  ERR_WONT_FIT,
stage2/builtins.c:1822:     errnum = ERR_WONT_FIT;
stage2/builtins.c:2386:      errnum = ERR_WONT_FIT;
stage2/builtins.c:2498:      errnum = ERR_WONT_FIT;
stage2/builtins.c:2594:   errnum = ERR_WONT_FIT;
stage2/builtins.c:2932:   errnum = ERR_WONT_FIT;
stage2/builtins.c:3711:   errnum = ERR_WONT_FIT;
stage2/builtins.c:3744:   errnum = ERR_WONT_FIT;

Since you have this message during boot, let's take a deeper look at boot.c

 if (! big_linux
      && text_len > linux_data_real_addr - (char *) LINUX_ZIMAGE_ADDR)
    {
      grub_printf (" linux 'zImage' kernel too big, try 'make bzImage'\n");
      errnum = ERR_WONT_FIT;
    }
 else if (linux_data_real_addr + LINUX_SETUP_MOVE_SIZE
               > RAW_ADDR ((char *) (mbi.mem_lower << 10)))
        errnum = ERR_WONT_FIT;

Since you have not the grub_printf message, maybe you are in this conditions:

linux_data_real_addr + LINUX_SETUP_MOVE_SIZE > RAW_ADDR ((char *) (mbi.mem_lower << 10))

=> Maybe your problem is linked to your 3 GB objectives. You may try to make it work with 1GB of mem. Computer science has seen many problems for file or mem > 2GB with 32 Bits CPU.

Your other chance of finding it lies in stage2/char_io.c:1215

  if ((addr < RAW_ADDR (0x1000))
      || (addr < RAW_ADDR (0x100000)
          && RAW_ADDR (mbi.mem_lower * 1024) < (addr + len))
      || (addr >= RAW_ADDR (0x100000)
          && RAW_ADDR (mbi.mem_upper * 1024) < ((addr - 0x100000) + len)))
    errnum = ERR_WONT_FIT;

You start address at 0x00100000 could trigger this condition.

Coren
  • 5,517
  • 1
  • 21
  • 34
  • I'm not sure where you heard that computer science has problems with mem > 2GB on 32-bit CPUs, but the Windows kernel loads at virtual address 2GB and the Linux kernel at 3GB (you can also ask Windows to load at 3GB) - which is the design I'm trying to emulate. Also, the Linux bootloader code won't be relevant here - Linux has its own kernel format grub knows how to load, whereas I'm using a plain old ELF file with a multiboot header. So the `zImage` code is mostly Linux-specific. –  Jun 30 '13 at 11:44
1

After a few late nights, lots of "this was a stupid idea in the first place!" and general frustration, I figured out what was causing the issue.

The problem was the use of PHDRS, which you can see by using objdump -x kernel and looking at the very first table. A broken executable looks like this:

Program Header:
PHDR off    0x00000034 vaddr 0x00001034 paddr 0x00000000 align 2**2
     filesz 0x00000080 memsz 0x00000080 flags r--
LOAD off    0x00000000 vaddr 0x00000000 paddr 0x000ffc00 align 2**12
     filesz 0x0000040c memsz 0x0000040c flags r--
LOAD off    0x00000000 vaddr 0xbffff000 paddr 0x000ff400 align 2**12
     filesz 0x0000102e memsz 0x0000102e flags r-x
LOAD off    0x00002000 vaddr 0xc0001000 paddr 0x00101000 align 2**12
     filesz 0x00000001 memsz 0x00004020 flags rw-

and can you see this?

 [Multiboot-elf, <0xffc00:0x40c:0x0>

The first load entry in this table is the value printed by grub here. paddr means, unsurprisingly, physical address, and it is below 1MB which is wrong (grub won't load anything below 1MB, period, 3GB higher half stuff or not).

What causes this? Well, I altered my PHDRS like so:

 PHDRS
 {
-    headers PT_PHDR PHDRS ;
-    mboot PT_LOAD FILEHDR ;
-    text PT_LOAD FILEHDR ;
+    headers PT_PHDR ;
+    mboot PT_LOAD ;
+    text PT_LOAD ;
     data PT_LOAD ;
 }

and all of a sudden:

Program Header:
PHDR off    0x00000000 vaddr 0x00000000 paddr 0x00000000 align 2**2
     filesz 0x00000000 memsz 0x00000000 flags ---
LOAD off    0x00000400 vaddr 0x00000400 paddr 0x00100000 align 2**12
     filesz 0x0000000c memsz 0x0000000c flags r--
LOAD off    0x00001000 vaddr 0xc0000000 paddr 0x00100400 align 2**12
     filesz 0x0000002e memsz 0x0000002e flags r-x
LOAD off    0x00002000 vaddr 0xc0001000 paddr 0x00101000 align 2**12
     filesz 0x00000001 memsz 0x00004020 flags rw-

Yep, you got it! This finds the entry and loads away just fine.

So what do FILEHDR and PHDRS mean? According to the linker script documentation:

You may use the FILEHDR and PHDRS keywords after the program header type to further describe the contents of the segment. The FILEHDR keyword means that the segment should include the ELF file header. The PHDRS keyword means that the segment should include the ELF program headers themselves.

I think the linker is being a bit fancy here and aligning the code where asked in physical memory, but starting the segment earlier to accommodate the various headers I inadvertently asked it to include, not realising what it did. This makes the program "not fit", although I think the grub error message might be wrong. You can see this in action by comparing the program headers table to the section table output from objdump -x.