0

I have multiple very-tiny questions in assembly which are related too so I collected them in this question instead of opening multiple ones and spamming the forum.

All questions relate to Assembly At&t syntax:

  1. Why I can't write something like:

     cmp %eax, $0x2
     jg goHere
    

Why it won't allow doing that compare operation (result isn't saved in one of the operands so it doesn't make sense not to allow it...) Note: I know I can solve this by reversing order and do jl instead of jg.

In att syntax this is supposed to check if 0x2 is bigger than %eax.

  1. Why I can write:

     mov $41, %rax
    

    While I can't write:

     mov ($41), %rax
    

That's quite strange, I was told by someone using of braces doesn't matter in assembly.

  1. When saving a string in memory, let's say "ABC" in address 0x100 how the memory should look like:

    0x100 - A 0x101 - B 0x102 - C

    0x100 - C 0x101 - B 0x102 - A

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Dan
  • 99
  • 6
  • 1
    Please try to ask only one question per question. As for (1), there is no instruction with the operands in the other order. For (2), the parentheses indicate a memory operand loading from a symbol named `$41`. If you want grouping, put the parentheses after the dollar sign like `$(41)`. – fuz Oct 05 '21 at 08:07
  • As for (3), the first byte of the string is at the lowest address. – fuz Oct 05 '21 at 08:08
  • @fuz first byte from which direction? for 1 I asked why there is no such order? for 2 you didn't answer why it won't work... – Dan Oct 05 '21 at 09:08
  • "Note: I know I can solve this by reversing order and do `jl` instead of `jg`." The inverse of `jg` is actually `jle` or its alias `jng`. Also, note that the condition goes between the two operands [in Intel order](https://stackoverflow.com/questions/2397528/mov-src-dest-or-mov-dest-src/60596999#60596999). `cmp $2, %eax` \ `jg` will jump if `eax` is greater (signed) than 2. (Intel order gives `cmp eax, 2` for the same machine code.) – ecm Oct 05 '21 at 10:31
  • @Dan The first byte is the character “A.” As for (1) I don't know, ask the makers of the 8086 processor. It's probably because such an instruction would be redundant. For (2) the instruction does work (as in, it assembles). It just does something else than you expect and I think I explained what it does. – fuz Oct 05 '21 at 12:59
  • *That's quite strange, I was told by someone using of braces doesn't matter in assembly.* - that's obviously wrong for AT&T syntax. For example `mov (%rdi), %rax` and `mov %rdi, %rax` use memory vs. register source operands. Assuming you meant parentheses `()` not braces `{}`, anyway. – Peter Cordes Oct 05 '21 at 22:17
  • You can write `$(41)` to use parens as part of the numeric expression that forms the immediate operand, like `$(41+3)*2`. But there's only one `$` in front of everything. `$` decorates the whole immediate, not numeric literals. And `($42)` doesn't start with a `$` so it's not an immediate. – Peter Cordes Oct 05 '21 at 22:24
  • 3 separate totally independent questions in one SO question is bad because at least them is a duplicates of existing questions, like about string byte-order (I'm pretty sure we have one about that). And probably also one about immediate "destination" `cmp`. As for `($41)`, it's pretty similar to something previous about `(42)` being a memory operand same as `42` (without a `$`), or probably `($42)` gets treated as a symbol name, just like `(foo)`, being a memory reference. Yup, `cmp ($42), %eax` assembles just fine, and `objdump -drwC` shows an `R_X86_64_32S` absolute reloaction for `$42` – Peter Cordes Oct 05 '21 at 22:25

1 Answers1

2

For 1, I have no special knowledge about the original design of x86, but I suspect that the reason we don't have cmp %reg, $imm is because of the relationship between cmp and sub. The instruction cmp $imm, %reg behaves exactly like sub $imm, %reg, with the one exception that for cmp the result of the subtraction is not written to the destination %reg, but instead is discarded. But the result of the subtraction is still computed internally by the CPU and used to set the flags. (At least this is true conceptually, and early CPUs almost certainly did the full subtraction; modern CPUs might have some optimizations that I don't know about.)

So this means that cmp could be implemented nearly for free once you had sub. You could use almost exactly the same circuitry and/or microcode. But you still have to decode the instruction, and so they made this easy as well by giving cmp almost the same encodings as sub, differing only in one bit. For instance, sub %reg, %reg/mem is opcodes 28h/29h and cmp %reg, %reg/mem is opcodes 38h/39h, differing only in bit 4. That one bit just signals the CPU whether to write the result to the destination operand or discard it.

This made it natural to give cmp exactly the same forms as sub:

  • We have sub %reg, %reg so there is cmp %reg, %reg.

  • We have sub %reg, mem so there is cmp %reg, mem.

  • We have sub mem, %reg so there is cmp mem, %reg.

  • We have sub $imm, %reg so there is cmp $imm, %reg.

  • There is even a special short encoding of sub $imm, %al/%ax/%eax/%rax which has a parallel cmp $imm, %al/%ax/%eax/%rax.

But there is no encoding of sub %reg, $imm because that would be nonsense, so cmp %reg, $imm would have needed a new encoding that wouldn't be parallel to an existing one for sub. The designers presumably decided not to waste decoding transistors and opcode space on creating one, because after all it wouldn't really provide any new functionality: cmp is practically always used in conjunction with a conditional jump (or later, conditional set), and in that case you can always achieve the same thing by using cmp $imm, %reg and reversing the test in the subsequent conditional jump/set.

Nate Eldredge
  • 48,811
  • 6
  • 54
  • 82
  • Not so much a special relationship between cmp and sub, more that `cmp` is *not* special and has the same forms as all the other 2-operand ALU instructions that date back to 8086, including `sub`, `add`, `xor`, etc, including the no-modrm `cmp $imm8, %al` and `cmp $imm16, %ax` forms. So yes, as you say no immediate destination because the other ALU instructions all RMW their "destination", and not worth special casing an extra opcode. The odd one out is `test` which lacks a `test $sign_extended_imm8, r/m16` form. – Peter Cordes Oct 05 '21 at 22:01