12

For example, with a boot sector that BIOS prints a to the screen main.asm:

org 0x7c00
bits 16
cli
mov ax, 0x0E61
int 0x10
hlt
times 510 - ($-$$) db 0
dw 0xaa55

Then:

nasm -o main.img main.asm
qemu-system-i386 -hda main.img -S -s &
gdb -ex 'target remote localhost:1234' \
    -ex 'break *0x7c00' \
    -ex 'continue' \
    -ex 'x/3i $pc'

I get:

0x7c00:      cli    
0x7c01:      mov    $0x10cd0e61,%eax
0x7c06:      hlt 

So it looks like the mov ax, 0x0E61 was interpreted as a 32-bit mov %eax and ate up the next instruction int 0x10 as data.

How can I tell GDB that this is 16-bit code?

See also:

Michael Petch
  • 46,082
  • 8
  • 107
  • 198
Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985

5 Answers5

20

As Jester correctly pointed out in a comment, you just need to use set architecture i8086 when using gdb so that it knows to assume 16-bit 8086 instruction format. You can learn about the gdb targets here.

I'm adding this as an answer because it was too hard to explain in a comment. If you assemble and link things separately you can generate debug information that can then be used by gdb to provide source level debugging even when done remotely against 16-bit code. To do this we modify your assembly file slightly:

;org 0x7c00    - remove as it may be rejected when assembling
;                with elf format. We can specify it on command
;                line or via a linker script.
bits 16

; Use a label for our main entry point so we can break on it
; by name in the debugger
main:
    cli
    mov ax, 0x0E61
    int 0x10
    hlt
    times 510 - ($-$$) db 0
    dw 0xaa55

I've added some comments to identify the trivial changes made. Now we can use commands like these to assemble our file so that it contains debug output in the dwarf format. We link it to a final elf image. This elf image can be used for symbolic debugging by gdb. We can then convert the elf format to a flat binary with objcopy

nasm -f elf32 -g3 -F dwarf main.asm -o main.o
ld -Ttext=0x7c00 -melf_i386 main.o -o main.elf
objcopy -O binary main.elf main.img

qemu-system-i386 -hda main.img -S -s &
gdb main.elf \
        -ex 'target remote localhost:1234' \
        -ex 'set architecture i8086' \
        -ex 'layout src' \
        -ex 'layout regs' \
        -ex 'break main' \
        -ex 'continue'

I've made some minor changes. I use the main.elf file (with symblic information) when starting up gdb.

I also add some more useful layouts for assembly code and the registers that may make debugging on the command line easier. I also break on main (not the address). The source code from our assembly file should also appear because of the debugging information. You can use layout asm instead of layout src if you prefer to see the raw assembly.

This general concept can work on other formats supported by NASM and LD on other platforms. elf32 and elf_i386 as well as the debugging type will have to be modified for the specific environment. My sample targets systems that understand Linux Elf32 binaries.


Debugging 16-bit real mode bootloader with GDB/QEMU

Unfortunately by default gdb doesn't do segment:offset calculations and will use the value in EIP for breakpoints. You have to specify breakpoints as 32-bit addresses (EIP).

When it comes to stepping through real mode code it can be cumbersome because gdb doesn't handle real mode segmentation. If you step into an interrupt handler you'll discover gdb will display the assembly code relative to EIP. Effectively gdb will be showing you disassembly of the wrong memory location since it didn't account for CS. Thankfully someone has created a GDB script to help. Download the script to your development directory and then run QEMU with something like:

qemu-system-i386 -hda main.img -S -s &
gdb -ix gdbinit_real_mode.txt main.elf \
        -ex 'target remote localhost:1234' \
        -ex 'break main' \
        -ex 'continue'

The script takes care of setting the architecture to i8086 and then hooks itself into gdb. It provides a number of new macros that can make stepping through 16 bit code easier.

break_int : adds a breakpoint on a software interrupt vector (the way the good old MS DOS and BIOS expose their APIs)

break_int_if_ah : adds a conditional breakpoint on a software interrupt. AH has to be equals to the given parameter. This is used to filter service calls of interrupts. For instance, you sometimes only wants to break when the function AH=0h of the interruption 10h is called (change screen mode).

stepo : this is a kabalistic macro used to 'step-over' function and interrupt calls. How does it work ? The opcode of the current instruction is extracted and if it is a function or interrupt call, the "next" instruction address is computed, a temporary breakpoint is added on that address and the 'continue' function is called.

step_until_ret : this is used to singlestep until we encounter a 'RET' instruction.

step_until_iret : this is used to singlestep until we encounter an 'IRET' instruction.

step_until_int : this is used to singlestep until we encounter an 'INT' instruction.

This script also prints out addresses and registers with segmentation calculated in. Output after each instruction execution looks like:

---------------------------[ STACK ]---
D2EA F000 0000 0000 6F62 0000 0000 0000
7784 0000 7C00 0000 0080 0000 0000 0000
---------------------------[ DS:SI ]---
00000000: 53 FF 00 F0 53 FF 00 F0 C3 E2 00 F0 53 FF 00 F0  S...S.......S...
00000010: 53 FF 00 F0 53 FF 00 F0 53 FF 00 F0 53 FF 00 F0  S...S...S...S...
00000020: A5 FE 00 F0 87 E9 00 F0 76 D6 00 F0 76 D6 00 F0  ........v...v...
00000030: 76 D6 00 F0 76 D6 00 F0 57 EF 00 F0 76 D6 00 F0  v...v...W...v...
---------------------------[ ES:DI ]---
00000000: 53 FF 00 F0 53 FF 00 F0 C3 E2 00 F0 53 FF 00 F0  S...S.......S...
00000010: 53 FF 00 F0 53 FF 00 F0 53 FF 00 F0 53 FF 00 F0  S...S...S...S...
00000020: A5 FE 00 F0 87 E9 00 F0 76 D6 00 F0 76 D6 00 F0  ........v...v...
00000030: 76 D6 00 F0 76 D6 00 F0 57 EF 00 F0 76 D6 00 F0  v...v...W...v...
----------------------------[ CPU ]----
AX: AA55 BX: 0000 CX: 0000 DX: 0080
SI: 0000 DI: 0000 SP: 6F2C BP: 0000
CS: 0000 DS: 0000 ES: 0000 SS: 0000

IP: 7C00 EIP:00007C00
CS:IP: 0000:7C00 (0x07C00)
SS:SP: 0000:6F2C (0x06F2C)
SS:BP: 0000:0000 (0x00000)
OF <0>  DF <0>  IF <1>  TF <0>  SF <0>  ZF <0>  AF <0>  PF <0>  CF <0>
ID <0>  VIP <0> VIF <0> AC <0>  VM <0>  RF <0>  NT <0>  IOPL <0>
---------------------------[ CODE ]----
=> 0x7c00 <main>:       cli
   0x7c01:      mov    ax,0xe61
   0x7c04:      int    0x10
   0x7c06:      hlt
   0x7c07:      add    BYTE PTR [bx+si],al
   0x7c09:      add    BYTE PTR [bx+si],al
   0x7c0b:      add    BYTE PTR [bx+si],al
   0x7c0d:      add    BYTE PTR [bx+si],al
   0x7c0f:      add    BYTE PTR [bx+si],al
   0x7c11:      add    BYTE PTR [bx+si],al
Mai Lapyst
  • 3
  • 2
  • 3
Michael Petch
  • 46,082
  • 8
  • 107
  • 198
  • 1
    Question about the `step_until` macros in that script: http://stackoverflow.com/questions/14031930/break-on-instruction-with-specific-opcode-in-gdb I've done a generic command with the Python API to single step until any given opcode: http://stackoverflow.com/a/31249378/895245 – Ciro Santilli OurBigBook.com Oct 19 '15 at 09:50
  • Simplified Python alternative to `stepo` that steps over any instruction: http://stackoverflow.com/a/33212794/895245 – Ciro Santilli OurBigBook.com Oct 19 '15 at 11:11
  • 4
    Doesn't work anymore. I used to be able to debug bootloaders with `set architecture i8086` but it is now broken, `set architecture` has no effect on disassembly in GDB 8.0.1. – doug65536 Nov 19 '17 at 06:26
  • If @doug65536 I have a chance tomorrow I'll compile up 8.0.1 of GDB and have a look. – Michael Petch Nov 19 '17 at 06:32
  • 1
    To be completely clear, the instruction `mov $0x7c00,%si` disassembles as `mov $0xbf7c00,%esi` and the extra junk `bf 00` comes from the opcode of the following `mov $0x600,%di` whose opcode byte is `bf`. This is with `set architecture i8086` invoked before I attach to the target, and `show architecture` reports `i8086`. Thanks for having a look. – doug65536 Nov 19 '17 at 21:19
  • @doug65536 Yeah clearly if that is what it is doing it is decoding the 16-bit code as 32-bit code despite the architecture. If that is in fact true, that has to be a bug. – Michael Petch Nov 19 '17 at 21:48
  • Unfortunately the script mentioned above is not longer available. Your link works, but the link to the script from that site does not. – sherrellbc Feb 10 '18 at 01:23
  • @doug65536 I second your observation. I am also using GDB 8.0.1; perhaps this issue is fixed in later releases. I observe the note `The target architecture is assumed to be i8086` when setting the architecture, but the disassembly is shown as you describe. – sherrellbc Feb 10 '18 at 01:30
  • @sherrellbc : I have yet to try it with GDB 8 (as I suggested earlier there seems to be some issue) There is a link in my answer to a secondary copy of the [GDB Script](http://www.capp-sysware.com/misc/stackoverflow/gdb_init_real_mode.txt) itself. I put it on my server when I answered the question in the event the original was deleted lol – Michael Petch Feb 10 '18 at 01:35
  • Oops. Sorry. I did not notice the first link. Also, I just built the latest release of 8.1.0 ([link](ftp://sourceware.org/pub/gdb/snapshots/branch/gdb-weekly-8.1.0.20180206.tar.xz)) and it resolved the problem. However, the registers still show as extended. – sherrellbc Feb 10 '18 at 01:41
  • Interestingly, I just tried to build the same 8.1.0 release I linked in my previous comment on another machine and it did _not_ work. I am going to see if I can figure out the differences when I have both machines in the same place. @MichaelPetch, what version are you using that is working? I've tried several releases after 8.0; I'm going to try going back into 7.x now. – sherrellbc Feb 11 '18 at 18:40
  • @sherrellbc I finally tried it out with 8.1.0 that you linked to in your comment (I built it on my Debian system). It does seem to work like the versions I used previously. Strange – Michael Petch Feb 11 '18 at 18:50
  • How are you building it? I tried using `./configure --with-tui`. Interestingly, the machine on which the 8.1.0 link worked was also Debian-based (Ubuntu). I tried building on an independent distro (Arch Linux) and it does _not_ work. I wonder what dependency could be causing this different behavior ... There must be something detected on Debian-based distributions that alters the build configuration. Would you mind posting your `config.log`? – sherrellbc Feb 11 '18 at 19:37
  • I can confirm this issue is also present on Windows when compiling versions 8.0.1 or 8.1.0 with either the cygwin or mingw toolchain, on either x86 architecture. Here are the flags I use to configure the project for build: `./configure --enable-static --disable-shared --enable-64-bit-bfd --disable-werror --disable-win32-registry --disable-rpath --with-expat --with-zlib --with-lzma --enable-tui` – ajkhoury Feb 20 '18 at 15:53
12

The answers already provided here are correct by seem to misbehave with recent versions of gdb and/or qemu.

The is an open issue on sourceware with the current details.

TL;DR

When in real-mode qemu will negotiate the wrong architecture (i386), you need to override it:

  1. Download the description file - target.xml (gist)
  2. Launch gdb and connect to your target (target remote ...)
  3. Set the target architecture using the description file - set tdesc filename target.xml

Setting architecture on gdb

Normally when you debug an ELF, PE or any other object file gdb can infer the architecture from the file headers. When you debug a bootloader there is no object file to read so you can tell gdb the architecture yourself (In the case of a bootloader arch will be i8086):

set architecture <arch>

Note: When attaching to a qemu VM there is actually no need to tell gdb the desired architecture, qemu will negotiate this information for you over the qXfer protocol.

Overriding the target architecture

As mentioned above when debugging qemu VMs qemu will actually negotiate its architecture to gdb, when targeting 32bit x86 the architecture is probably i386, which is not the architecture we want for real-mode.

Currently there seem to be an issue in gdb that causes it to choose the most "featureful compatible architecture" between the target's architecture (i386) and the user provided architecture (i8086). Because gdb sees i386 as a proper super set of i8086 it uses it instead. Choosing i386 causes all operands to default to 32 bits (instead of 16), this what causes the disassembler errors.

You can override the target architecture by specifying a target.xml description file:

set tdesc filename <file>

I made this description file from the qemu sources and changed the architecture to i8086.

Matan Shahar
  • 3,190
  • 2
  • 20
  • 45
  • 3
    I'm your upvote. It should be noted that in my example I actually build the bootloader as an ELF file and then generate the binary from the ELF file. You can boot the binary file in QEMU and specify the ELF file for the symbolic debug info when running with GDB. I actually do it that way so that I can get the symbolic information despite QEMU running the binary file (bootloader) – Michael Petch Mar 19 '19 at 18:06
  • Or at least tell me where in the sources did you find all the information to put together this XML. – SasQ May 13 '22 at 23:36
  • @SasQ I wrote this answer more than two years ago and since didn't have a look the source. I honestly barely remember what I did – Matan Shahar May 14 '22 at 10:17
3

The target description file seems to changed in qemu, so the linked xml from Matan Shahar's answer doesn't work anymore. But you can do something like this:

$ echo '<?xml version="1.0"?><!DOCTYPE target SYSTEM "gdb-target.dtd"><target><architecture>i8086</architecture><xi:include href="i386-32bit.xml"/></target>' > target.xml
$ wget https://raw.githubusercontent.com/qemu/qemu/master/gdb-xml/i386-32bit.xml

And then:

(gdb) target remote localhost:26000
[...]
Breakpoint 1, 0x00007c00 in ?? ()
(gdb) x /5i 0x7c24
   0x7c24:      cli    
   0x7c25:      cld    
   0x7c26:      mov    eax,0xc08e0050
   0x7c2b:      xor    ebx,ebx
   0x7c2d:      mov    al,0x2
(gdb) set tdesc filename target.xml 
warning: A handler for the OS ABI "GNU/Linux" is not built into this configuration
of GDB.  Attempting to continue with the default i8086 settings.

(gdb) x /5i 0x7c24
   0x7c24:      cli    
   0x7c25:      cld    
   0x7c26:      mov    ax,0x50
   0x7c29:      mov    es,ax
   0x7c2b:      xor    bx,bx

Side note: set debug remote-packet-max-chars 2000 and set debug remote 1 gdb commands were useful to debug this.

Kirill Spitsyn
  • 226
  • 2
  • 6
  • Thanks a ton ! Indeed the OP target.xml alas doesn't work anymore, but with yours it does !! – a427 Jun 19 '20 at 01:13
  • Remote 'g' packet reply is too long (expected 312 bytes, got 344 bytes): 000000000000000063060000000000… – mirabilos Mar 31 '22 at 23:29
  • Same problem for me, except with different numbers: "expected 348 bytes, got 536 bytes". GDB 10.2, QEMU 2.12.1. – SasQ May 13 '22 at 23:16
2

2020 12 24 Update

tested on gdb-9.2.0 Ubuntu20.04

the following target.xml from @Kirill Spitsyn

$ echo '<?xml version="1.0"?><!DOCTYPE target SYSTEM "gdb-target.dtd"><target><architecture>i8086</architecture><xi:include href="i386-32bit.xml"/></target>' > target.xml
$ wget https://raw.githubusercontent.com/qemu/qemu/master/gdb-xml/i386-32bit.xml

the following gdbinit based on gdb_init_real_mode.txt and renamed as gdb.txt

ADD the following line in gdb.txt under the # Real Mode section

set tdesc filename target.xml

after edit gdb.txt

# Real mode
set architecture i8086
set tdesc filename target.xml

then build and test:

main.asm:

bits 16

; Use a label for our main entry point so we can break on it
; by name in the debugger
main:
    cli
    mov ax, 0x0E61
    int 0x10
    hlt
    times 510 - ($-$$) db 0
    dw 0xaa55

compile.fish:

nasm -f elf32 -g3 -F dwarf main.asm -o main.o
ld -Ttext=0x7c00 -melf_i386 main.o -o main.elf
objcopy -O binary main.elf main.img
qemu-system-i386 -hda main.img -S -s &
gdb --nx -ix gdb.txt main.elf \
        -ex 'target remote localhost:1234'

Result:

real-mode-gdb$ b main  
Breakpoint 1 at 0x7c00: file main.asm, line 6.  
real-mode-gdb$ c   
Continuing.  
---------------------------[ STACK ]---  
D002 F000 0000 0000 6F5E 0000 8016 0000   
8057 0000 0000 0000 0000 0000 8016 0000   
---------------------------[ DS:SI ]---  
00000000: 53 FF 00 F0 53 FF 00 F0 C3 E2 00 F0 53 FF 00 F0  S...S.......S...  
00000010: 53 FF 00 F0 54 FF 00 F0 53 FF 00 F0 53 FF 00 F0  S...T...S...S...  
00000020: A5 FE 00 F0 87 E9 00 F0 42 D4 00 F0 42 D4 00 F0  ........B...B...  
00000030: 42 D4 00 F0 42 D4 00 F0 57 EF 00 F0 42 D4 00 F0  B...B...W...B...  
---------------------------[ ES:DI ]---  
00000000: 53 FF 00 F0 53 FF 00 F0 C3 E2 00 F0 53 FF 00 F0  S...S.......S...  
00000010: 53 FF 00 F0 54 FF 00 F0 53 FF 00 F0 53 FF 00 F0  S...T...S...S...  
00000020: A5 FE 00 F0 87 E9 00 F0 42 D4 00 F0 42 D4 00 F0  ........B...B...  
00000030: 42 D4 00 F0 42 D4 00 F0 57 EF 00 F0 42 D4 00 F0  B...B...W...B...  
----------------------------[ CPU ]----  
AX: AA55 BX: 0000 CX: 0000 DX: 0080  
SI: 0000 DI: 0000 SP: 6F00 BP: 0000  
CS: 0000 DS: 0000 ES: 0000 SS: 0000  
  
IP: 7C00 EIP:00007C00  
CS:IP: 0000:7C00 (0x07C00)  
SS:SP: 0000:6F00 (0x06F00)  
SS:BP: 0000:0000 (0x00000)  
OF <0>  DF <0>  IF <1>  TF <0>  SF <0>  ZF <0>  AF <0>  PF <0>  CF <0>  
ID <0>  VIP <0> VIF <0> AC <0>  VM <0>  RF <0>  NT <0>  IOPL <0>  
---------------------------[ CODE ]----  
=> 0x7c00 <main>:       cli  
   0x7c01:      mov    ax,0xe61  
   0x7c04:      int    0x10  
   0x7c06:      hlt  
   0x7c07:      add    BYTE PTR [bx+si],al  
   0x7c09:      add    BYTE PTR [bx+si],al  
   0x7c0b:      add    BYTE PTR [bx+si],al  
   0x7c0d:      add    BYTE PTR [bx+si],al  
   0x7c0f:      add    BYTE PTR [bx+si],al  
   0x7c11:      add    BYTE PTR [bx+si],al  
  
Breakpoint 1, main () at main.asm:6  
6           cli  

note:

  1. it is strange that the target specification needs both

    set architecture i8086

    and

    set tdesc filename target.xml

  2. to handle the INT instruction, use stepo before execute or add breakpoint after the INT code.

muhua
  • 21
  • 3
  • I tried this with gdb v10.2 and didn't work; code still interpreted as 32-bit code! – user3524743 May 29 '21 at 02:58
  • 26th of August 2021, gdb v9.2 , qemu-system-i386 v2.11.1 and also don't work. What's wrong - gdb or qemu ? – Jarda Pavlíček Aug 26 '21 at 19:02
  • @JardaPavlíček@user3524743 , debian11 + gdb 10.1-7 + qemu-system-i386 5.2.0 + nasm 2.15.0 + objcopy 2.35.2 + ld 2.35.2, still works on my computer. – muhua Feb 15 '22 at 23:53
  • 1
    What does this do, why does it work (or rather, why doesn’t)? Debian bullseye. – mirabilos Mar 31 '22 at 23:25
  • Seems to be some sort of XML file that describes the format of data being exchanged between QEMU and GDB during debugging. QEMU reports its architecture as i386 in real mode, which GDB interprets as 32-bit instruction set and uses the wrong format for these data structures, getting garbage information from them, or having troubles with disassembling the instructions correctly as 8086 instruction set. – SasQ May 13 '22 at 23:21
1

It works with:

set architecture i8086

as mentioned by Jester.

set architecture is documented at: https://sourceware.org/gdb/onlinedocs/gdb/Targets.html and we can get a list of targets with:

set architecture

(no arguments) or tab completion on the GDB prompt.

Community
  • 1
  • 1
Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
  • 1
    `set architecture` without parameters – Michael Petch Oct 05 '15 at 21:36
  • 1
    Not sure if this is of interest but on Debian or Ubuntu (the latter you are fond of these days) I recommend installing package `gdb-multiarch`. It has many more targets than Intel and is very useful if you happen to be using cross compilers targeting non-Intel sytems (like PowerPC on QEMU etc). – Michael Petch Oct 05 '15 at 21:41
  • @MichaelPetch thanks for the info! I'll definitely keep that in mind when I venture into ARM-land one day :-) – Ciro Santilli OurBigBook.com Oct 05 '15 at 21:48
  • 1
    @CiroSantilli冠状病毒审查六四事件法轮功 I stumbled across this while searching for ways to compile flat/raw binaries. Several comments here mention the use of QEMU but I cant understand the relationship between it and the question. Can you explain how and why you are using QEMU in this scenario? – typedeaf Apr 26 '20 at 17:56
  • @typedeaf I want to run those flat binaries as baremetal executables in QEMU: https://github.com/cirosantilli/x86-bare-metal-examples and GDB debug them. – Ciro Santilli OurBigBook.com Apr 26 '20 at 18:38
  • 1
    Might be worth mentioning somewhere that BOCHS has a very good built-in assembler which is generally recommended over QEMU+GDB because it understands real-mode segmentation, and can help debug the switch to protected mode. – Peter Cordes May 24 '20 at 18:54