txt[0] = 0x4142;
is an assignment to a char
object, so the right hand side is implicitly cast to (char)
after being evaluated.
The NASM equivalent is mov byte [rsp-4], 'BA'
. Assembling that with NASM gives you the same warning as your C compiler:
foo.asm:1: warning: byte data exceeds bounds [-w+number-overflow]
Also, modern C is not a high-level assembler. C has types, NASM doesn't (operand-size is on a per-instruction basis only). Don't expect C to work like NASM.
C is defined in terms of an "abstract machine", and the compiler's job is to make asm for the target CPU which produces the same observable results as if the C was running directly on the C abstract machine. Unless you use volatile
, actually storing to memory doesn't count as an observable side-effect. This is why C compilers can keep variables in registers.
And more importantly, things that are undefined behaviour according to the ISO C standard may still be undefined when compiling for x86. For example, x86 asm has well-defined behaviour for signed overflow: it wraps around. But in C, it's undefined behaviour, so compilers can exploit this to make more efficient code for for (int i=0 ; i<=len ;i++) arr[i] *= 2;
without worrying that i<=len
might always be true, giving an infinite loop. See What Every C Programmer Should Know About Undefined Behavior.
Type-punning by pointer-casting other than to char*
or unsigned char*
(or __m128i*
and other Intel SSE/AVX intrinsic types, because they're also defined as may_alias
types) violates the strict-aliasing rule. txt
is a char array, but I think it's still a strict-aliasing violation to write it through a uint16_t*
and then read it back via txt[0]
and txt[1]
.
Some compilers may define the behaviour of *(uint16_t*)txt = 0x4142
, or happen to produce the code you expect in some cases, but you shouldn't count on it always working and being safe other code also reads and writes txt[]
.
Compilers (i.e. C implementations, to use the terminology of the ISO standard) are allowed to define behaviour that the C standard leaves undefined. But in a quest for higher performance, they choose to leave a lot of stuff undefined. This is why compiling C for x86 is not similar to writing in asm directly.
Many people consider modern C compilers to be actively hostile to the programmer, looking for excuses to "miscompile" your code. See the 2nd half of this answer on gcc, strict-aliasing, and horror stories, and also the comments. (The example in that answer is safe with a proper memcpy
; the problem was a custom implementation of memcpy
that copied using long*
.)
Here's a real-life example of a misaligned pointer leading to a fault on x86 (because gcc's auto-vectorization strategy assumed that some whole number of elements would reach a 16-byte alignment boundary. i.e. it depended on the uint16_t*
being aligned.)
Obviously if you want your C to be portable (including to non-x86), you must use well-defined ways to type-pun. In ISO C99 and later, writing one union member and reading another is well-defined. (And in GNU C++, and GNU C89).
In ISO C++, the only well-defined way to type-pun is with memcpy
or other char*
accesses, to copy object representations.
Modern compilers know how to optimize away memcpy
for small compile-time constant sizes.
#include <string.h>
#include <stdint.h>
void set2bytes_safe(char *p) {
uint16_t src = 0x4142;
memcpy(p, &src, sizeof(src));
}
void set2bytes_alias(char *p) {
*(uint16_t*)p = 0x4142;
}
Both functions compile to the same code with gcc, clang, and ICC for x86-64 System V ABI:
# clang++6.0 -O3 -march=sandybridge
set2bytes_safe(char*):
mov word ptr [rdi], 16706
ret
Sandybridge-family doesn't have LCP decode stalls for 16-bit mov
immediate, only for 16-bit immediates with ALU instructions. This is an improvement over Nehalem (See Agner Fog's microarch guide), but apparently gcc8.1 -march=sandybridge
doesn't know about it because it still likes to:
# gcc and ICC
mov eax, 16706
mov WORD PTR [rdi], ax
ret
define the array as a single variable.
... and accessing the elements with ((uint8_t*) &txt)[0]
Yes, that's fine, assuming that uint8_t
is unsigned char
, because char*
is allowed to alias anything.
This is the case on almost any implementation that supports uint8_t
at all, but it's theoretically possible to build one where it's not, and char
is a 16 or 32-bit type, and uint8_t
is implemented with a more expensive read/modify/write of the containing word.