1

I can use _mm_set_epi64 to store two uint64_ts into a __m128 intrinsic. But hunting around, I see various ways to get the values back out: There's reinterpret_cast (and it's evil twin C-style casts), it's sibling union { __m128; uint64[2]; }; and memcpy, there's accessing fields of __m128. There's __m128i _mm_load_si128(__m128i *p);, but I'm not seeing a _mm_get_* function. Am I missing something? If there's a _mm_set_epi64 then there must be a non-cast way to get the uint64_ts back out, right? (Otherwise why would they bother providing _mm_set_epi64?)

I see Get member of __m128 by index? but the "correct answer" has a broken link and implies there's a load function, but all the loads I see map __m128 to __m128. Shouldn't there be a void _mm_get_epi64(__m128, uint64_t* outbuf)?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Ben
  • 9,184
  • 1
  • 43
  • 56
  • 1
    I believe what you want is `_mm_extract_epi64`. – Raymond Chen May 24 '21 at 12:32
  • 1
    I see, so given `__int64 _mm_extract_epi64 (__m128i a, const int imm8)` it would be `uint64_t x = static_cast(_mm_extract_epi64(a, 0))`. – Ben May 24 '21 at 12:59
  • Looks like it. Give it a try. If it works, post an answer and accept it. – Raymond Chen May 24 '21 at 16:44
  • Do you actually have a `__m128` (4 floats), or a `__m128i` (integer)? The Q&A you linked is about getting a `float` element out of a `__m128`, and is considering runtime-variable indices. – Peter Cordes May 24 '21 at 20:10
  • On https://software.intel.com/sites/landingpage/IntrinsicsGuide you'll find `__int64 _mm_cvtsi128_si64x(__m128i)` to get the low 64 bits, otherwise you need to shuffle or use SSE4.1 `pextrq` (`_mm_extract_epi64`). – Peter Cordes May 24 '21 at 20:10
  • Or if you just want to store the whole vector together, there's `_mm_storeu_si128( (__m128i*)outbuf, vec )`. And yes that is strict-aliasing safe as part of the intrinsics API. – Peter Cordes May 24 '21 at 20:12
  • 1
    [SSE: Difference between \_mm\_load/store vs. using direct pointer access](https://stackoverflow.com/q/11034302) shows how things are normally done if you want to store whole vectors. [How to extract bytes from an SSE2 \_\_m128i structure?](https://stackoverflow.com/q/39884960) shows how to get separate `uint64_t` halves out, if that's what you want. – Peter Cordes May 24 '21 at 20:19

0 Answers0