Qemu and Raw Binary File

Question

I am compiling and running binaries (boot sector, stage 1, stage 2) for practice. The boot sector is asm and the first stage is asm which run fine. The second stage loads at 0x1000 and I have some asm which jumps to the start of my C Code. My jumps and calls seem to be off (short) by two bytes.

I have tried the code in Bochs and Qemu (stepping through it). All the code looks good. I have even disassembled it in IDA and every looks good. I assume it might be my lack of code alignment knowledge.

The 2nd stage starts at 0x1000:

0x1000: cli    
0x1001: xor    eax,eax
0x1003: mov    eax,0x1f1a
0x1008: mov    esp,eax
0x100a: sti    
0x100b: jmp    0x1010

The first jump lands at 0x1010 (this is disassembled C code):

0x1010: push   0x16b4
0x1015: call   0x14ca   <---
0x101a: add    esp,0x4
0x101d: jmp    0x101d

The call above to 0x14CA actually lands at 0x000014c9, two bytes short.

As in the above code, I expect the jump or call to land at the operand address, but it always misses short by two bytes.

How does the code at 0x1010 get combined with the code at 0x14CA? Is it all C code? — 1201ProgramAlarm, Jun 26 '19 at 01:27
It will be hard to help unless you show us all your code and how you build the kernel and different stages and how you put it together. Do you have a github project? — Michael Petch, Jun 26 '19 at 02:34
They are just files on my computer right now and I am using Open Watcom v2 for the compiler, so link with wlink. — Michael Greene, Jun 26 '19 at 09:52
@MichaelGreene Please post your full source code and the exact commands you type to link. — fuz, Jun 26 '19 at 10:12

Michael Petch · Accepted Answer · 2019-06-26T14:56:30.973

This is a wild guess that may actually be wrong. It is based on the fact that in 32-bit code the relative JMP and CALL instructions you encoded are 5 bytes and in 16-bit code they are 3 bytes. 5 bytes - 3 bytes = 2 bytes. Given that relative JMP and CALL targets are based on the distance from the start of the next instruction it may offer a hint as to what might have gone wrong.

If I take this code:

bits 32
org 0x1000

    cli
    xor    eax,eax
    mov    eax,0x1f1a
    mov    esp,eax
    sti
    jmp    0x1010
    push   0x16b4
    call   0x14ca
    add    esp,0x4
    jmp    0x101d

And assemble it with:

nasm -f bin stage2.asm -o stage2.bin

And review the 32-bit decoding with:

ndisasm -b32 -o 0x1000 stage2.bin

I get:

00001000  FA                cli
00001001  31C0              xor eax,eax
00001003  B81A1F0000        mov eax,0x1f1a
00001008  89C4              mov esp,eax
0000100A  FB                sti
0000100B  E900000000        jmp dword 0x1010
00001010  68B4160000        push dword 0x16b4
00001015  E8B0040000        call dword 0x14ca
0000101A  83C404            add esp,byte +0x4
0000101D  E9FBFFFFFF        jmp dword 0x101d

This looks correct. If however I decode the same code as 16-bit with:

ndisasm -b16 -o 0x1000 stage2.bin

I get:

00001000  FA                cli
00001001  31C0              xor ax,ax
00001003  B81A1F            mov ax,0x1f1a
00001006  0000              add [bx+si],al
00001008  89C4              mov sp,ax
0000100A  FB                sti
0000100B  E90000            jmp word 0x100e
0000100E  0000              add [bx+si],al
00001010  68B416            push word 0x16b4
00001013  0000              add [bx+si],al
00001015  E8B004            call word 0x14c8
00001018  0000              add [bx+si],al
0000101A  83C404            add sp,byte +0x4
0000101D  E9FBFF            jmp word 0x101b
00001020  FF                db 0xff
00001021  FF                db 0xff

The instruction decoding is incorrect however the JMPs and CALLs are present and go to the wrong memory locations. This looks awfully like the observations you are seeing.

Without seeing your code I hope that by the time you start executing stage 2 at 0x1000 that you have entered 32-bit protected mode. If you haven't then I suspect that is the root of your problems. I believe 32-bit encoded instructions are executing in 16-bit real mode.

Update

From the comments the OP suggests they entered 32-bit protected mode as part of the process of entering unreal mode. They had the belief that unreal mode would still decode instructions as 32-bit code and thus the problem.

You get into unreal mode by entering 32-bit protected mode and return to 16-bit real mode. Unreal mode is still 16-bit real mode with the exception that the limits in the hidden descriptor cache are set to 0xffffffff (4GiB limit). Once returning to 16-bit real mode you'll be able to directly address memory in segments beyond 64KiB using 32-bit addressing, but the code is still running in 16-bit real mode.

If you are writing code for 16-bit unreal mode your compiler and assembler still need to generate 16-bit code. If you intend to write/generate 32-bit code then unreal mode isn't an option and you will need to enter 32-bit protected mode to execute 32-bit code.

I was not getting into protected mode in stage 1 just before my jump into stage 2. In stage 1 I used code that setup unreal mode not thinking that when I switched back the system would decode my 32 bit as 16 bit. — Michael Greene, Jun 26 '19 at 10:03
@MichaelGreene : Yep,you get into unreal mode by entering 32-bit protected mode and coming back out. Unreal mode is still 16-bit real mode with the exception that the limits on the hidden descriptor entries set to 0xffffffff. Once returning to real mode you'll be able to directly address memory in segment beyond 64KiB but the code is still running in 16-bit real mode.If you are writing code for 16-bit unreal mode your compiler and assembler still need to generate 16-bit code. If you intend to write 32-bit code then unreal mode isn't an option. — Michael Petch, Jun 26 '19 at 14:33

Qemu and Raw Binary File

1 Answers1

Update