3

Usually one should be wary of transmuting (or casting) pointers to a higher alignment. Yet the interface to the above functions require *const _m128i and *mut _m128i pointers, respectively. Both are SIMD-aligned, which means I'd need to keep my arrays SIMD-aligned, too. On the other hand, the intrinsics are explicitly designed to load/store unaligned data.

Is this safe? Shouldn't we change the interface? Or at least document this fact?

mcarton
  • 27,633
  • 5
  • 85
  • 95
llogiq
  • 13,815
  • 8
  • 40
  • 72

1 Answers1

3

I think this is a cross-language duplicate of Is `reinterpret_cast`ing between hardware vector pointer and the corresponding type an undefined behavior?.

As I explained over there, Intel defined the C/C++ intrinsics API such that loadu / storeu can safely dereference an under-aligned pointer, and that it's safe to create such pointers, even though it's UB in ISO C++ even to create under-aligned pointers. (Thus implementations that provide the intrinsics API must define the behaviour).

The Rust version should work identically. Implementations that provide it must make it safe to create under-aligned __m128i* pointers, as long as you don't dereference them "manually".

The other API-design option would be to have another version of the type that doesn't imply 16-byte alignment, like a __m128i_u or something. GNU C does this with their native vector syntax, but that's way off topic for Rust.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • "Thus implementations that provide the intrinsics API must define the behaviour" => can ? If ISO said UB it's UB, not implemented defined. – Stargateur Sep 06 '18 at 20:45
  • 1
    @Stargateur: If ISO C++ says "implementation defined", every implementation *must* define the behaviour (in one way or another). If ISO C++ says something is UB, implementations *can* define the behaviour if they choose, and then code written to depend on such behaviour is no longer correct ISO C++. e.g. `gcc -fwrapv` defines the behaviour of signed-integer overflow as 2's complement wraparound, like you get in assembly language. But code that depends on `__m128i` being available in the first place can only work on the same limited set of implementations, so there's no downside. – Peter Cordes Sep 06 '18 at 21:31