1

Can a byte pointer ever be safely passed to vld2q_u16? I'm mostly concerned about static analyzer complaints.

uint16x8x2_t load_interleaved_shorts (const uint8_t* const ptr) {
    uint16_t* p16 = (uint16_t*)ptr; // possible undefined behavior ?
    return vld2q_u16(p16);
}

In my instance: The pointer is always aligned to a 16 byte boundary. The compiler doesn't known the alignment of the pointer. The code must be portable and strictly follow the C90 standard.

Assumptions: Replacing vld2q_u16 with vld1q_u8 / vuzpq_u8 would hurt performance. The probability of the compiler optimizing a scalar pattern into a vld2q_u16 is small.

Edit: suppressed some warnings by casting to a void pointer. vld2q_u16((const uint16_t*)(const void*)src)

aqrit
  • 792
  • 4
  • 14

1 Answers1

1

The code must be portable and strictly follow the C90 standard.

... plus everything implied by the presence of ARM NEON intrinsics! (Although that may not help a static analyzer). (Related: Is `reinterpret_cast`ing between hardware SIMD vector pointer and the corresponding type an undefined behavior? discusses that for x86, but your case is a bit different; your pointers are aligned).


In C, it's safe to cast between pointer types (without dereferencing) as long as you never create a pointer with insufficient alignment for its type. You don't need a compile-time-visible guarantee of alignment, you just need to not ever actually create a uint16_t* that doesn't have alignof(uint16_t) alignment.

(This makes it unlikely for a static analyzer to complain even if that wasn't the case, unless it could see something like (uint16_t*)(1 + (char*)&something_aligned) where you take an aligned address and offset it by an odd number, which would be guaranteed to produce a misaligned address.)

And in practice, compilers targeting byte-addressable machines do more or less define the behaviour even for creating misaligned pointers. (For example, Intel intrinsics for unaligned loads depend on creating an unaligned __m128i*.) As long as you don't deref them, which is unsafe even in practice on targets that allow unaligned loads; see my answer on this Q&A for an example and the blog links that cover other examples.

So you're 100% fine: your code never creates a misaligned uint16_t*, and doesn't directly dereference it.

If ARM has unaligned-load intrinsics, it would even be safe to form a misaligned uint16_t* and pass it to the function; the existence/design of the intrinsics API implies that it's safe to use it that way.


Other things that are undefined behaviour but which you aren't doing:

  • It's technically UB to form a pointer that isn't pointing inside an object, or one-past-end, but in practice mainstream implementations allow that as well.

  • It's strict-aliasing UB to dereference a uint16_t* that doesn't point to uint16_t objects. But any dereferencing only happens inside intrinsic "functions", so you don't have to worry about the strict-aliasing rule. (Which may pointer-cast to some special type and deref, or may pass the pointer on to a __builtin_arm_whatever() compiler built-in.)

I assume that ARM load/store intrinsics are defined similar to memcpy, being able to read/write the bytes of any object. So e.g. you could vld2q_u16 on an array of int, double, or char. Intel intrinsics are defined that way (e.g. GCC/clang use __attribute__((may_alias)).) If not, it wouldn't be safe.

And BTW, the char*-can-alias-anything rule only works one way. Yes it's safe to point a char* at a uint16_t, but if you have an actual array of char buf[100], those objects are definitely char objects, and it's UB to access them through a uint16_t*. However, if you only have char*, and only one other pointer-type other than char* is used, then you can look at the memory as having whatever the other type is, and every char* access aliasing that.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • gcc casts to `__builtin_neon_hi *` but I can't find info on that type. – aqrit May 23 '21 at 00:38
  • @aqrit: GCC integer type size codes are "si" = single integer (i.e. `int`), "hi" = half-integer, "di" = double-integer, etc. So I assume it's a builtin name for a NEON vector of half-int (int16_t) elements. IDK where to find more general details on `__builtin_neon_*`, but a google search found GCC source code saying those names are *not* user-visible. https://github.com/gcc-mirror/gcc/blob/15d30d2f20794d29ceabcfd57d230d6387284115/gcc/config/arm/arm-builtins.c#L1360. – Peter Cordes May 23 '21 at 00:46
  • 1
    You can construct test cases to see whether they respect aliasing, e.g. store a uint16, vector load, store another uint16, and vector load again. If the optimize doesn't realize that the vector load aliases the uint16_t assignment, it can eliminate the first one as a "dead store". – Peter Cordes May 23 '21 at 00:47
  • Lost cause, just got `error: cast from 'const BYTE *' (aka 'const unsigned char *') to 'const __m128i *' increases required alignment from 1 to 16 [-Werror,-Wcast-align]` – aqrit May 23 '21 at 18:38
  • @aqrit: ah, yes, static analysis will sometimes warn about things that *can* be part of legal code, but can also be signs of problems. Sounds like you should just use `-Wno-cast-align`, or enable that via pragma for some functions. Don't make your code worse just to keep an over-cautious compiler warning happy. (Or possibly you can use `__attribute__((aligned(16)))` on the pointer type? But that can easily get you into a mess where *each* `aligned_char` element has size 16, including padding, instead of just promising the compiler that one specific pointer is aligned.) – Peter Cordes May 23 '21 at 18:43