What is the purpose of GNU assembler directive .code16?

Question

I do not understand the practical usage of .code16 or other .code* directives. What I understood from the answers at this question on StackOverflow,

when someone defines .code16 in their assembly code and does the following :

$ gcc -c -m32 -o main.o main.s

It ignores the .code16 and the output assembly would be meant to run on 32-bit platform. If someone does not specify -m flag, it seems to take the one configured for gcc as default depending on the host. Hence, to conclude, .code* directive is always ignored and superseded by -m flag.

Can someone please correct me if I am wrong in my understanding, and what is the situation when I would use .code16 because I can always define that using -m16 and .code* is anyway going to be ignored depending on the target mode.

Are .code16 (or others) only meant to throw errors when the data couldn't fit in 16-bit registers, else otherwise, they would remain dormant?

No it doesn't ignore the `.code16`. It sets the default operand and address size to 16 bits. It generates code intended to run in 16 bit mode. It just happens to be stuffed into a 32 bit ELF binary. — Jester, Feb 02 '20 at 11:38
@Jester : Thanks. When you say "code intended to run in 16 bit mode" what does that exactly mean? I could not confirm it through objdump, because it parses the ELF depending on the `-M` flag passed. Hence, it generates different results for the same ELF ,e.g. `%ax` would be changed to `%eax` if `-M` defined as 32-bit instead of 16-bit while parsing. — Naveen, Feb 02 '20 at 11:41
Also `-m16` only has effect on code generated by `gcc`, not if you have your own hand written assembly. — Jester, Feb 02 '20 at 11:42
It means default operand and address sizes are 16 bit. There are prefixes `0x66` and `0x67` in the machine code which switch to the other mode. In 16 bit code you don't need prefixes for 16 bit stuff but you do for 32 bit. In 32 bit mode, it's the reverse. The assembler needs to know which mode you will be running in, so it can generate the prefixes as appropriate. `mov $42, %ax` in 16 bit mode is `b8 2a 00`, the same in 32 bit mode is `66 b8 2a 00`. — Jester, Feb 02 '20 at 11:45
Yes, the `-M` flag to `objdump` basically simulates how the cpu would see the instruction. So you can use that to test what would happen if you tried to run the code in the wrong mode. — Jester, Feb 02 '20 at 11:47

score 3 · Accepted Answer · answered Feb 02 '20 at 11:50

3

The only reason you normally ever have for using .code16, .code32, or .code64 is in a kernel or bootloader when you want to have machine code for different modes in the same file. e.g. a bootloader that starts in real mode (.code16) could enable protected mode and far jump (ljmp) to a 32-bit code segment. You'd want to use .code32 before that block of code.

If that's not what you're doing, don't use them.

Using them in other cases just lets you shoot yourself in the foot and put 16-bit machine code into a 32-bit or 64-bit ELF executable so you get runtime failure instead of catching the mistake at build time. (e.g. because push %eax isn't valid in 64-bit mode). Don't put .code32 at the top of your 32-bit program; use a comment that says to assemble with gcc -m32.

These directives tell the assembler what mode the CPU will be in when it decodes these instructions. So it knows what the default operand-size and address size will be, and whether or not a prefix is needed for an instruction that uses a 32-bit or 16-bit register.

So for example mov %eax, (%ecx) assembles to 89 01 in 32-bit mode.

But after .code16, it assembles to 67 66 89 01.

If you then disassemble that as 32-bit machine code, it's 67 66 89 01 mov %ax, (%bx,%di) (because ModRM is different for memory operands in 16 vs. 32 and 64-bit mode).

You wouldn't normally use .code16 manually. You can use gcc -m16 foo.c to get GCC to insert .code16gcc at the top of the file, so you can run it in 16-bit mode even though it will still use 32-bit operand-size and address-size (requiring a 386-compatible CPU).

If you wanted to include 32 or 16-bit machine code as data in a normal 64-bit program, e.g. so your program could write it to a file or modify a running process with it, you could also use .code32 or .code16.

answered Feb 02 '20 at 11:50

Peter Cordes

328,167
45
605
847

But when I have added `.code16` and the follow-up instruction is `push $10,%eax`, the assembler does not throw any error, even though %eax is a 32-bit register name. What is assembler doing in this case? – Naveen Feb 02 '20 at 11:59
It adds the `66` prefix. As I said, you can toggle between 16 and 32 bit with a prefix. You can still access the 32 bit registers in 16 bit mode, if your cpu actually has them. – Jester Feb 02 '20 at 12:06
@InsaneCoder: 32-bit registers are available in 16-bit mode (on 386-compatible CPUs), **exactly** the same way that 16-bit registers and operand-size are available in 32-bit mode (and in 64-bit mode). One operand-size is the default, the operand-size prefix switches to the other one. – Peter Cordes Feb 02 '20 at 12:08
@Jester. Thanks. Does that mean , even though I specify `%ax` or `%eax`, I am going to use only 16-bit part of that same register, since I am in 16-bit mode as specified `.code16` – Naveen Feb 02 '20 at 12:09
1

No, it will use whatever you specify. You can use the full 32 bits of eax. The assembler will insert the prefix to make it so. – Jester Feb 02 '20 at 12:09
1

@InsaneCoder: No, the assembler uses the right prefixes so the instruction decodes as written. (Assuming it executes in the same mode you told the assembler it would). That's the whole point of all this: so you can write `add $12345, %eax` and have it run as `add $1235, %eax` not `%ax` with 2 extra bytes of immediate left over (because in that case instruction length depends on operand-size). – Peter Cordes Feb 02 '20 at 12:10
Oh. does that mean , I can still use `mov $0xFFFFFF, %eax` in 16-bit mode as long as the machine supports it (i.e.32-bit registers or higher) and it will fail only in case the machine does not have sufficient enough registers to hold that data. – Naveen Feb 02 '20 at 12:12
1

@InsaneCoder “does not have sufficient enough registers to hold that data” – no. The code fails (or does something different) on processors that do not support 32 bit mode. There is no such thing as “not having sufficient enough registers.” – fuz Feb 02 '20 at 12:57
Is it also possible to access 32-bit address while running in 16-bit mode (provided the processor has 32 bit address lines or higher)? Because 0x67h prefix seems to be doing that "Changes size of address expected by the instruction. 32-bit address could switch to 16-bit and vice versa." – Naveen Feb 02 '20 at 14:51
1

@InsaneCoder: yes, read an x86 manual, or the example in my answer which shows how `mov %eax, (%ecx)` assembles in 16-bit mode. Seriously, it's like if you asked "is 2+2 really 4?", then we explain that it is, and then you ask "is 2*3 really 6?" Come on, what part of what you read was hard to understand and make you want to ask me about it? You already found some documentation that fully explained it: "Changes size of address expected by the instruction. 32-bit address could switch to 16-bit and vice versa" – Peter Cordes Feb 02 '20 at 14:56

What is the purpose of GNU assembler directive .code16?

1 Answers1