How to determine AT&T size suffixes on assembly Instructions?

Question

I'm new to Assembly x86, using the AT&T format. Below I'm working on a question that is asking me to find the instruction suffix to each of the instructions, but I can't figure out any of these instruction's suffixes. I tried looking at past questions such as Chosing suffix (l-b-w) for mov instruction and How to determine the appropriate MOV instruction suffix based on the operands? but they haven't been that helpful and I still don't understand how these instructions' suffixes were determined. Can you explain each one of them by any chance?

enter image description here

All of them except `push` have their size implied by the bare register operand. — Peter Cordes, Nov 11 '21 at 14:29
What do you mean by the bare register operand? Is that the source register? So does that mean the suffix is determined by the size of your source register? Like %eax is 4 bytes wide so it'll be l, $0xFF is an immediate value so that's 1 byte wide etc? But why is (%eax) w? — amy37195, Nov 11 '21 at 14:40
@h0r53: `bl` and `dh` are each *one* byte. (The way you explained it could be misinterpreted). More importantly, `0xFF` is just a number; it doesn't have a size. It does not fit in a *signed* imm8 to be sign-extended to 32-bit, though, so `pushl $0xFF` has to be encoded as 5-byte `pushl imm32`, not 2-byte `pushl imm8`. The numeric value of any constant is completely irrelevant for implying the operand-size of any instruction, except in really terrible assemblers like emu8086, fortunately for everyone's sanity. Only the possible size of the machine-code encoding is affected. — Peter Cordes, Nov 11 '21 at 14:43
`(%eax)` isn't a bare register (aka "register direct" in CS terminology), it's an addressing mode that uses 32-bit *address-size*. Address-size and operand-size two independent properties of an x86 instruction. See [Assembly mov instruction without suffix](https://stackoverflow.com/q/52594200) which I added to the list of duplicates. (You almost always want to use 32-bit operand-size in 32-bit mode, but `mov %eax, (%di)` is encodeable, doing a 32-bit store to an address taken from the low 2 bytes of EDI.) And re: the width of `0xFF`, see my previous comment. — Peter Cordes, Nov 11 '21 at 14:45
Apologies for the confusion. The idea is that `eax` is 4 bytes, `ax` is 2 bytes, `ah` and `al` are 1 byte. The size of the data being moved is what determines the suffix to use (when it comes to registers). The exception is push, which is pushing 4 bytes total to the stack via `pushl`. I think this is because push by default supports much larger immediate values, such as a 32-bit address. — h0r53, Nov 11 '21 at 14:49
@h0r53: Having a default for `push` is mostly because you *almost never* want to push any size other than the current mode. There is no `pushb`, and the use-cases for `pushw` in 32-bit mode are vanishingly rare, limited to weird hacks that temporarily leave the stack misaligned. It's just a matter of asm source-level syntax design, of course, nothing fundamental about how x86 machine code works. — Peter Cordes, Nov 11 '21 at 14:53
I'm still not getting the movw (%eax),%dx part. I get that %dx is 2 bytes wide and w is used for moving 2 bytes, but how many bytes is (%eax) moving? — amy37195, Nov 11 '21 at 14:53
@amy37195: `(%eax)` doesn't imply any operand-size. It uses the 4-byte EAX to reference memory at an address, because addresses are 32-bits wide. The width of the access is implied by the other operand, `%dx`, and by the instruction mnemonic `movw`. All instructions except require both their operands to be the same size, with only a few exceptions like `movzbl` / `movswl` and other variations of sign-extending mov / loads, and shift/rotate by `%cl` like `shll %eax, %cl` — Peter Cordes, Nov 11 '21 at 14:57
Nothing is being moved _in_ `eax`, the parenthesis `(%eax)` implies the value at the location pointed by `eax`. So `eax` has a pointer, you take two bytes from that location. — h0r53, Nov 11 '21 at 14:57
In C, `sizeof(int*) == sizeof(short*) == sizeof(char*)` (in normal implementations like on x86), even though the pointed-to types are 3 different sizes (in normal implementations like on x86). — Peter Cordes, Nov 11 '21 at 14:58
So just for clarification, movw (%eax),%dx is getting a value from an address %eax is pointing to, but it must be getting 2 bytes since it will be moved to %dx? — amy37195, Nov 11 '21 at 15:07
Also some other side questions, does pop behave like push when it comes to suffixes? Like push command suffixes are pretty much l all the time, are pop commands like that too? Also is it possible to move a smaller byte register into a bigger byte register but not vice versa? For example, movw %dx,%eax is possible but not movl %eax,%dx? And is movw %eax,%dx possible? — amy37195, Nov 11 '21 at 15:10
@amy37195 Yes, correct. The plain `mov` requires source and destination to have the same size. `dx` is a 16 bit register, so 2 bytes will be moved. And yes, `push` works like `pop`. Indeed, it works exactly like any other instruction as far as suffixes are concerned. — fuz, Nov 11 '21 at 15:22
As for your last question, neither of the two are possible. To move into a larger register, use a zero extending or sign extending instruction like `movzwl %dx, %eax`. To reduce the size, just move from the smaller register corresponding to the large register. E.g. `movw %ax, %dx` to move the low 16 bits of eax into dx. — fuz, Nov 11 '21 at 15:23
So movw %dx, %eax and movw %eax,%dx are also impossible? And instead it should be movzwl %dx,%eax for the 1st one and movw %ax,%dx for the 2nd one? If that's so, I understood your explanation and it really helped, thanks! — amy37195, Nov 11 '21 at 15:36
@amy37195 Yes, correct. But note that these are not instructions given in the exercise you posted. The instructions you posted have memory operands, not register operands. Always happy to help you. — fuz, Nov 11 '21 at 16:31

How to determine AT&T size suffixes on assembly Instructions?

0 Answers0