Meaning of suffix "x" in intrinsics like "_mm256_set1_epi64x"

Question

In some intrinsics they use suffix x like _mm256_set1_epi64x . What's the meaning of it? For reference, _mm256_set1_epi32 comes without this suffix.

I *think* the "x" means the intrinsic (and corresponding instruction) is available only on 64-bit targets. But I'm not certain of that, and I can't offer any proof. It's just a pattern I've noticed a couple of times. The other question you ask here is really off-topic; we don't do requests for external resources. — Cody Gray - on strike, Jul 08 '17 at 18:24
I think it's just when there are two variants of an intrinsic, an original "historical" variant and a subsequent, more useful one, e.g. `_mm_set1_epi64` takes an `__m64` parameter, whereas `_mm_set1_epi64x` takes an `int64_t`. — Paul R, Jul 08 '17 at 20:44
@PaulR: Actually `__int64`, which isn't necessarily the same thing as `int64_t`. They might not be alias-compatible. One could be `long` while the other is `long long`, so the compiler could assume that an `int64_t *` didn't alias an `__int64 *`. But yes, the history is MMX->SSE2 conversion intrinsics for 32-bit code took the non-x names, I think. — Peter Cordes, Jul 09 '17 at 14:02
@PeterCordes: Intel and Microsoft seem to favour `__int64`, while gcc uses `long long`, presumably because `__int64` is non-standard. — Paul R, Jul 09 '17 at 20:08
@PeterCordes I want to pass uint64_t to _mm256_set1_epi64x but it looks like it accepts long long. Do you happen to know the right API for 64 bit integers ? — gansub, Sep 26 '18 at 14:22
@gansub: `long long` is a 64-bit integer. (And is exactly 64 bits on all compilers that support Intel's intrinsics, not wider, and doesn't munge unsigned integers with casting. `epi64x` means 64-bit integer, just use it) — Peter Cordes, Sep 26 '18 at 14:24
@PeterCordes Thanks for prompt response.I modified this file https://github.com/flippingbits/cssl/blob/master/skiplist.c#L286 just as you said but after modification I am getting core dumps. On topic on this site ? — gansub, Sep 26 '18 at 14:27
@gansub: not as a comment thread. If you didn't change the rest of the code to `_pd` to match the element width of `_mm256_set1_epi64x`, then it probably doesn't work. If you can make a [mcve] of your problem, and you can't figure it out from using a debugger on that MCVE, then post a new question. `_mm256_set1_epi64x` itself isn't the problem. — Peter Cordes, Sep 26 '18 at 14:38
@PeterCordes made the modifications you recommended and it does not compile now. I will write the compilation errors as a question. FYI I am not really an Assembly Programmer. First time with AVX. — gansub, Sep 26 '18 at 14:54

Peter Cordes · Accepted Answer · 2017-07-09T15:20:50.397

TL:DR: MMX->SSE2 conversion intrinsics took the non-x _mm_set/set1_epi64 names.

This is all guesswork based on current function names, known history, and some compiler behaviour:

The first Intel SIMD intrinsics were for MMX. __m64 is the MMX equivalent of SSE2 __m128i and AVX2 __m256i. There were no 64-bit x86 CPUs at the time, so the widest set intrinsic was __m64 _mm_set_pi32 (int e1, int e0). According to the intrinsic-finder, there still isn't any intrinsic for movq mm0, rax. I think you can/should just cast int64_t to __m64. (Although last time I experimented in the last year or so, gcc or clang (I forget which) did a poor job optimizing the MMX asm. Aging compiler support is yet another reason to avoid MMX for new projects.)

When SSE2 was introduced in 2001, AMD64 / x86-64 still wasn't released yet, and wouldn't be supported by Intel for a few years. (At that time they were hoping that IA-64 / Itanium would be the future and replace x86). I haven't checked old manuals, but I guess that
__m128i _mm_set1_epi64 (__m64 a) was available back then and
__m128i _mm_set1_epi64x (__int64 a) probably wasn't. (Notice that __int64 is not int64_t from <stdint.h>. But it is a 64-bit integer type and is nothing to worry about.)

The epi stands for Extended(?) Packed Integer. epi instead of pi tells you it's an SSE intrinsic, not an MMX intrinsic. For intrinsics that convert from one element width to another, the intrinsics use the source width if that unambiguously identifies the operation (at least for the ones I looked at). e.g. _mm_packs_epi32 (packssdw) or _mm_unpackhi_epi16 (punpckhwd). PMOVZX needs both numbers, because there's _mm_cvtepu8_epi32 (pmovzxbd), _mm_cvtepu8_epi64 (pmovzxbq, etc.

Compilers did of course support 64-bit integers in 32-bit mode, so it would have made sense for Intel to include intrinsics for working with them. But IIRC, in some compilers the 64x intrinsics are only available when compiling 64-bit code. The 64x is only relevant for converting to/from scalar 64-bit integers, so you won't find an x version of _mm_add_epi64 or anything like that.

This only-in-64bit thing may still exist for _mm256_set1_epi64x depending on the compiler, but either way that history explains why 64x but not 32x.

(Sorry I'm lazy and didn't put together an experiment on Godbolt to check for current compilers with -m32. It might be interesting to see what kind of asm you get from casting int64_t to __m64 and using a _mm_set intrinsic in 32-bit code.)

Just checked Visual Studio with versions I have. 2012 does not have `64x` in 32-bit mode. 2015 and all newer have `64x` in 32-bit mode. But don't have `64` in 64-bit mode due to [not supporting MMX in 64-bit mode](https://stackoverflow.com/a/32446304/2945027), but 2012 has header declarations that produce link error, and 2015 and newer ban them from header. — Alex Guteniev, Aug 19 '23 at 20:38

Meaning of suffix "x" in intrinsics like "_mm256_set1_epi64x"

1 Answers1

Linked