0

Lets say I have a bootloader assembly code to debug which uses .code16 and .code32 to define codes for different mode of the CPU it is running in. The architecture for which this bootloader is meant is 64-bit (x86) CPU.

Now what mode should be used during disassembly (with tools like objdump, gdb, etc.) ? i8086? i386? x86-64?

As per my understanding and observation, we should use combination of them all depending on the section of code we are analyzing (.code16,.code32) as that gives expected results (to me).

For example :

.code16
mov %ax, %bx
mov %ecx, %edx

.code32
mov %eax, %ebx
mov %cx, %dx

Compiled like this :

$ as -o test.o test.S. #16-bit and 32-bit code packed in 64-bit elf, default 64 since host is 64-bit

Diassembly for 16-bit mode CPU. 16-bit code section is displayed fine, whereas 32-bit code section is messed up.

$ objdump -m i8086 -d test.o

test.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <.text>:
   0:   89 c3                   mov    %ax,%bx
   2:   66 89 ca                mov    %ecx,%edx
   5:   89 c3                   mov    %ax,%bx
   7:   66 89 ca                mov    %ecx,%edx

Analyzing in 32-bit mode. Now 32-bit coe section is disassembled perfectly, even though 16-bit code section is messed up.

$ objdump -m i386 -d test.o

test.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <.text>:
   0:   89 c3                   mov    %eax,%ebx
   2:   66 89 ca                mov    %cx,%dx
   5:   89 c3                   mov    %eax,%ebx
   7:   66 89 ca                mov    %cx,%dx

Please confirm if the strategy is perfect, else, please correct me what is the best method while disassembling the mixed assembly code (16,32,64 bit).

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Naveen
  • 7,944
  • 12
  • 78
  • 165

2 Answers2

2

There's no way for a disassembler to know what's supposed to be 16-bit code or 32-bit code, so you need to tell it explicitly. For example with objdump:

> objdump -m i8086 --stop-address 0x5 -D test.o

test.o:     file format pe-i386


Disassembly of section .text:

00000000 <.text>:
   0:   89 c3                   mov    %ax,%bx
   2:   66 89 ca                mov    %ecx,%edx

> objdump -m i386 --start-address 0x5 -D test.o

test.o:     file format pe-i386


Disassembly of section .text:

00000005 <.text+0x5>:
   5:   89 c3                   mov    %eax,%ebx
   7:   66 89 ca                mov    %cx,%dx
   a:   90                      nop
   b:   90                      nop

Since you're using this with bootloaders, you may also want to use the --adjust-vma option:

> objdump -m i8086 --adjust-vma 0x7c00 --stop-address 0x7c05 -D t457.o

t457.o:     file format pe-i386


Disassembly of section .text:

00007c00 <.text>:
    7c00:       89 c3                   mov    %ax,%bx
    7c02:       66 89 ca                mov    %ecx,%edx

If you're not building a binary bootloader, then you might want to consider putting the different code types into different sections to make it easier to select which part to dissemble (-j option of objdump).

Other command line disassemblers have options like these, for example ndisasm's -k option.

Ross Ridge
  • 38,414
  • 7
  • 81
  • 112
  • In objdump's help section `--stop-address=ADDR Only process data whose address is <= ADDR` seems incorrect. Shouldn't it be just `<` instead of `<=` ? This is what we get in the output as well. – Naveen Feb 14 '20 at 05:53
  • 1
    @InsaneCoder Yah, it seems to be an error in the documentation based on how it actually works. – Ross Ridge Feb 14 '20 at 06:03
  • Thanks. For the documentation error, I will report back to the binutils group in order to improve it. – Naveen Feb 14 '20 at 06:05
  • 1
    @RossRodge : I checked the latest binutils clone and its already rectified there. Thanks. – Naveen Feb 14 '20 at 06:07
  • 1
    Perhaps worth being totally clear that there's also no way for the *CPU* to know what's supposed to be 16 or 32-bit code; nothing stops you from jumping to or falling into `.code16` code in 32-bit mode, or vice versa. This is why I suggested @InsaneCoder use BOCH's debugger if you want to make sure you're disassembling in the same mode the CPU is truly executing, in a way that won't be fooled by bugs in your code or your build process. But neat trick with objdump start/stop ranges, certainly answers the direct question. – Peter Cordes Feb 14 '20 at 07:17
0

Your contrived example doesn't include a far jump that could change mode, so either disassembly is valid depending on which mode you expect the CPU to decode it in. Execution will continue through both chunks in one mode.

For example, in Determine your language's version I show all 3 ways to disassemble that block of machine code; all 3 are equally valid and produce different results intentionally. (It's polyglot machine code like x86-32 / x86-64 polyglot machine-code fragment that detects 64bit mode at run-time?)


If you do have different blocks of code in a real use-case, presumably you'd want to look at 16-bit disassembly for the 16-bit parts, and 32-bit disassembly for the 32-bit parts. Or just read the source. Or get your assembler to generate a listing, like nasm -l /dev/stdout -fbin foo.asm. Then you'll get "the machine code" for each source line, according to what mode you told the assembler to assemble for.

GAS can also make listings, with as -a or gcc -c -Wa,-a (-Wa passes extra options directly to the assembler).

The listing includes only machine code hex and source line (including comments), not disassembly. So if you used tricks like .byte to manually encode an instruction, you won't see how the CPU will interpret it. For that, see Ross's answer or use a debugger.

$ as -a foo.s -o foo.o   # still creates an output file as normal
GAS LISTING foo.s                       page 1


   1                    .code16
   2 0000 89C3              mov %ax, %bx
   3 0002 6689CA           mov %ecx,   %edx  # comment
   4              
   5                    .code32
   6 0005 89C3            mov %eax, %ebx     # comment 1
   7 0007 6689CA          mov %cx, %dx       # comment 2

GAS LISTING foo.s                       page 2


NO DEFINED SYMBOLS

NO UNDEFINED SYMBOLS

The left-hand column is address.

(I modified your source to vary the spacing and add comments, to double-check that it was just dumping the source line, not disassembly.)

GAS listings default to to stdout, with -ahls listing options gas man page. There is no -h "high level source" for hand-written asm files, that option is for making listings from compiler output, but that's fine. There are option options for columns / pagination, like --listing-lhs-width=number


You can also use an emulator with a built-in debugger (like BOCHS) to show disassembly in the mode the CPU is currently in. BOCHS knows about modes, real-mode segmentation, and so on. This is probably your best bet for making sure the right instructions are really executing. (You might want the source in another window; IDK if BOCHS can read debug info / source.)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847