14

Considering a pared-down example of down-casting unsigned to unsigned char,

void unsigned_to_unsigned_char(unsigned *sp, unsigned char *dp)
{
  *dp = (unsigned char)*sp;
}

The above C code is translated to assembly code with gcc -Og -S as

movl    (%rdi), %eax
movb    %al, (%rsi)

For what reason is the C-to-assembly translation not as below?

movb    (%rdi), %al
movb    %al, (%rsi)

Is it because this is incorrect, or because movl is more conventional, or shorter in encoding, than is movb?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
aafulei
  • 2,085
  • 12
  • 27
  • 4
    Related: [Why does GCC chose dword movl to copy a long shift count to CL?](https://stackoverflow.com/q/63571651) - again, `movl` is the most efficient instruction out of the multiple valid choices, in that case not even matching the operand-size of the C variable. Same for loading a `char` function arg from the stack, although I can't find that Q&A again. – Peter Cordes Jul 17 '21 at 16:13

2 Answers2

10

Writing to an 8 bit x86 register possibly incurs an extra merge µop when the new low byte is merged with the old high bytes of the corresponding 32/64 bit register. This can also cause an unexpected data dependency on the previous value of the register.

For this reason, it is generally a good idea to only write to 32/64 bit variants of general purpose registers on x86.

fuz
  • 88,405
  • 25
  • 200
  • 352
6

The cast in your question is wholly unnecessary as the language will effectively perform that cast before the assignment anyway, and so it contributes nothing to the generated code (remove it and see no changes, no errors or warnings).

The right hand side deference is of type unsigned int so, that's what it done.  Given a 32-bit bus, there's no performance penalty for doing a word dereference (modulo alignment issues).

If you wanted other, you can cast before the dereference, as follows:

void unsigned_to_unsigned_char(unsigned *sp, unsigned char *dp)
{
  *dp = *(unsigned char *)sp;
}

This will produce the byte move instructions you're expecting.

https://godbolt.org/z/57nzrsrMe

Erik Eidt
  • 23,049
  • 2
  • 29
  • 53
  • Thanks for the answer! I upvoted yours as well, as I believe it provides some different perspective. In the meanwhile, I would like to mark that the compilation link you provided has `-Os` (optimized for size) instead of `-Og` (optimized for debugging-experience) as the flag. If it were `-Og`, then the first assembly line for your C code would be `movzbl (%rdi), %eax` instead of `movb (%rdi), %al`. Looks like gcc has some complicated rules for the assembly translation choice. – aafulei Jul 17 '21 at 15:53
  • 1
    @aafulei The `movb` instruction has a shorter encoding. It is only used in this case when optimising for size at the detriment of performance. – fuz Jul 17 '21 at 16:21
  • 3
    @aafulei: The rules aren't *that* complicated: they pretty much amount to "avoid writing partial registers" [Why doesn't GCC use partial registers?](https://stackoverflow.com/q/41573502), except when optimizing for size takes precedence. – Peter Cordes Jul 17 '21 at 17:19
  • Yes, my bad on not calling out `-Os`. `-O3` produces `movzx` which extends but is still a byte-sized memory operation. – Erik Eidt Jul 17 '21 at 17:42
  • 1
    Of course, the code now depends on the endianness of the machine. – Carsten S Jul 18 '21 at 14:59