Consider code c.c
void f(unsigned char *a, long long *b)
{
*b = (long long)*a;
}
Compile it with
$ gcc -Og -S c.c
where
$ gcc --version
gcc (MinGW-W64 x86_64-posix-seh, built by Brecht Sanders) 10.2.0
and my machine is a 64-bit Windows 10.
Among other lines, I get the assembly code as follows
01 movzbl (%rcx), %eax
02 movq %rax, (%rdx)
My question is: Why isn't the first line written in this way
01 movzbq (%rcx), %rax
What if the higher 32 bits of %rax
originally had some non-zero bits, and were not set to zero after movzbl (%rcx), %eax
? Won't these non-zero bits (if any) get copied to (%rdx)
by movq %rax, (%rdx)
?
A follow-up question is: Even the above concern is unneeded, still, why isn't the first line written in this way
01 movzbq (%rcx), %rax
i.e. governed by which rule the translation from C to assembly code is done in the given way?
(I have some knowledge with C but am new to assembly code.)
Update: Would like to make some clarification after I read the comments (appreciate all of them). A comment says the function is unnecessary, and I may just do that assignment. That is right. As another comment rightly puts, this is a pared-down example. What I want to understand is simply why the C-to-assembly translation happens this way when casting a unsigned char
to long long
.