4

Background

I am trying to make a bootloader that would work for two architectures: x86 and PDP-11. The main OS is written for a PDP-11-compatible machine, but booting from x86 should work too, starting an emulator.

AFAIK, x86 loads the first disk sector to 0x7c00 and jumps there, if the last two bytes are 0x55 0xaa. In contrast, the PDP-11-compatible machine loads the first sector to 0o20000 (octal) and executes it if the first command is NOP and the last two bytes are 0xaa 0x55. However, due to some hardware details, the loaded data is actually inverted -- for example, where x86 would read 0x12, the other machine would read 0xed. This is somewhat a feature in this context because if I make the last two bytes 0x55 0xaa, they would work for both machines.

In conclusion, the PDP-11-compatible machine requires the first two bytes to contain NOP command, i.e. 0o000240, or 0x00a0. The data is inverted, so x86 would actually read 0xff5f instead.

Problem

0x5f is a real command in x86. Unfortunately, it's pop di. AFAIK, both sp and ss values are not specified, so this command reads who-knows-what.

My questions are:

  • In practice, can I assume they either point at valid stack or are both set to some placeholder, e.g. 0x0000:0x0000 or 0xffff:0xffff?
  • May ss:sp point to memory-mapped hardware registers which are unsafe to read? If yes, what is the worse thing that can happen if I read them? I don't want to accidentally kill a laptop.
  • May ss:sp point to unavailable memory, i.e. may pop di trigger a bus error? If yes, how will the BIOS recover from it, i.e. will it reboot, show a message or do something else?
Ivanq
  • 141
  • 2
  • 11
  • 1
    It's safe to use at least a small amount of stack space below SS:SP on entry to a bootloader. It will be used asynchronously by interrupts including keyboard; real-mode doesn't have a separate kernel stack. (IIRC, interrupts are enabled when the firmware jumps to the code of a legacy BIOS MBR boot sector.) – Peter Cordes Jun 10 '20 at 21:11
  • 2
    But x86 grows the stack downward, so [`pop di`](https://www.felixcloutier.com/x86/pop) is actually doing `sp += 2` after reading the value at `[sp]`. If the initial stack is right below the boot sector, this leaves SP pointing at an instruction that hasn't executed yet. Or it wraps SP. I don't think that faults, though. – Peter Cordes Jun 10 '20 at 21:14
  • I accidentally used `sp-2` instead of `sp` (as if the stack grew upward), that's fixed now. Sorry for confusion. – Ivanq Jun 10 '20 at 21:16
  • 1
    `ss:sp = 0x0000:0x7c00` case is a valid point, I'm afraid I can't fix it though -- putting `cli` as the second instruction, even if it's possible, doesn't fix the problem. – Ivanq Jun 10 '20 at 21:18
  • The chance of an interrupt arriving before you can disable interrupts is probably small enough for a polyglot machine code hobby project. Also, future hardware in the next few years may stop supporting legacy BIOS booting and only support UEFI. – Peter Cordes Jun 10 '20 at 21:20
  • I think so. You might want to transform the comments to an answer, I think you answered my questions. – Ivanq Jun 10 '20 at 21:21
  • 1
    There are other SO users more familiar with x86 BIOS booting that could answer better. I'm not sure if initial SS:SP being right below `0000:7c00` is common, but I think I've seen someone say that it's sometimes done. [Default registers and segments value on booting x86 machine](https://stackoverflow.com/q/43359327) says the stack could be anywhere in RAM. – Peter Cordes Jun 10 '20 at 21:26
  • @PeterCordes According to Intel that was supposed to happen by 2020. I don't know if anyone else actually is planning dropping support for legacy booting. – Ross Ridge Jun 10 '20 at 21:27
  • 2
    Note that the `0x55 0xaa` bytes at the end of a PC boot sector are only required to be there when booting from hard disks (and things emulated as hard disks). It's not required when booting from floppy (and things emulated as floppies.). – Ross Ridge Jun 10 '20 at 21:29
  • 1
    If ss:sp is 7c00 and an interrupt arrives after the pop di and before the next instruction, then the interrupt would cause cs to be written to 7c00/7c01. CS could in theory be any value between 0 and 7c0, but in practice it is certainly one of those two values. If you can make the instruction encoding after the pop di to be robust to having its first byte be changed from ff (the original value) to either 00 or 07, then you can handle even that unlikely scenario. – prl Jun 11 '20 at 00:52
  • 1
    However, a more likely scenario (but still not too likely) is that ss:sp is immediately below some BIOS data structure, and an interrupt occurs after the pop di and before cli. It doesn’t seem possible to be prevent that, but it is sufficiently unlikely that I think you can ignore it. You can do push di immediately (even before cli) to restore the value and sp, keeping the window of vulnerability small. – prl Jun 11 '20 at 01:09
  • According to the Intel SDM, pop di (0x5f) cannot fault in real mode under any circumstances. The only faults listed for pop in real mode are related to a memory destination, not a register, or a lock prefix, which obviously you don’t have. – prl Jun 11 '20 at 01:14
  • @fuz: I think you are mistaking `di` for `dx`. (I do have the problem that [`dl` is overwritten by a `pop dx` instruction](https://hg.ulukai.org/ecm/ldosboot/file/e4657346d2fb/iniload.asm#l313) in my loader but it is due to the "MZ" signature, which contains `pop dx`. Solution here is to depend on the value in the boot sector pointed to by `ss:bp` and/or at 0:7C00h.) – ecm Jun 11 '20 at 12:16
  • @prl: `pop di` will fault in R86M if `ss` segment limit is (the default) 64 KiB and `sp` is 0FFFFh, because the stack read will read a word from that offset then. This is unlikely but can happen. – ecm Jun 11 '20 at 12:21
  • 1
    @ecm: That can't happen with an aligned SP, though, and I think it's safe to assume that the initial SP is aligned by 2 at least. But interesting point that the high byte that attempts to load is outside the SS limit, rather than wrapping. Semi-related: [Is it allowed to access memory that spans the zero boundary in x86?](https://stackoverflow.com/q/47702410) - wrapping at the address-width boundary is fine. But in real mode, the linear address width is wider than the offset width, unlike in long mode. – Peter Cordes Jun 11 '20 at 12:30
  • 1
    @ecm Oh indeed! Never mind then. – fuz Jun 11 '20 at 13:47
  • @ecm, I expected that to be true, but it’s not listed in the SDM. – prl Jun 11 '20 at 15:35
  • 1
    @prl: Just to be sure I checked on several machines. dosemu2 and qemu (both without kvm on a Debian 9 amd64 server) do not fault. My actual hardware 686 (Pentium III) machine does fault with an exception/interrupt 0Ch. (It is the same as an interrupt because this is in R86M directly booted into the debugger.) – ecm Jun 11 '20 at 15:48

0 Answers0