Analog of _mm256_cmp_epu32_mask for AVX/AVX2

Question

Let x be a __m256i containing the data for 8 32 bit unsigned integers.

I want a mask of type __m256 (float) indicating whether each of the values in x is greater than the corresponding uint32 in another __m256i vector.

I don't have AVX512 so can't use _mm256_cmp_epu32_mask.

FOLLOW-UP:

Thank you @EOF. I've got this:

__m256i maxOfBoth = _mm256_max_epu32(val, t);
__m256i greater_than = _mm256_cmpeq_epi32(maxOfBoth, val);
__m256 mask = _mm256_castsi256_ps(greater_than);

mask seems to contain nans where the ones should be.. My casting is wrong maybe.. How do I make sure that mask is 0.0s and 1.0s?

`vpcmpgtd` compares signed numbers, but you can flip the sign bit of both operands before the comparison to get the same result as an unsigned comparison: https://godbolt.org/z/553KsxoET — EOF, May 25 '21 at 17:16
Though clang seems to prefer computing the maximum of the vector elements and comparing that to the elements, which avoids loading a constant (the vector of uint32s with only the sign bit set) from memory: https://godbolt.org/z/914TvoT7E — EOF, May 25 '21 at 17:23
AFAIK this is the usual convention for masks - "true" is signaled with all-bits-one, which is NaN when interpreted as a `float`. One simple approach would be to AND the result with a vector filled with `1.0`. — Nate Eldredge, May 25 '21 at 18:42
What do you *actually* want to do with the vector of 0.0 / 1.0? Often you can do that with `_mm256_and_ps` or `_mm256_blend_ps` or something with the all-0 / all-1-bits compare result directly, instead of actually creating a float vector with `_mm256_and_ps(_mm256_set1_ps(1.0), cmp_result)` And does it *have* to be an unsigned integer compare? — Peter Cordes, May 25 '21 at 18:46
@PeterCordes, yes it must be an unsigned integer compare. The above answers have actually solved it for me, but there probably is a faster way. I want to zero out certain values in a __m256 vector by multiplying by this 0/1 mask, where the mask values are 1 with a certain probability. — user3055163, May 25 '21 at 19:02
@user3055163 You can zero-out certain values just by using `_mm256_and_si256` with your all-0 and all-1 bit-mask. No need to do expensive multiply by 0.0 and 1.0. — Arty, May 25 '21 at 19:14
Thank you all for your help. These comments have solved it for me. What an amazing community! — user3055163, May 25 '21 at 19:19
[Conditional SSE/AVX add or zero elements based on compare](https://stackoverflow.com/q/49982536) is an example of using a compare mask to zero some elements (e.g. on the input of a vector you're going to use with `add`). Oh, and [SSE comparison returns vector of NANs](https://stackoverflow.com/q/57091959) directly answers what's left of your question, using the 0 / (all-ones) NAN mask to zero elements of another vector. Hmm, except I didn't find a good duplicate for the max/cmp idiom to do unsigned compare without AVX-512. In the question body isn't ideal; I can reopen if someone wants. — Peter Cordes, May 25 '21 at 19:33
@PeterCordes It is especially not ideal since the question no contains a solution for a greater or equal comparison, but asks for a greater than comparison (and like most of the "simulate instruction X on architecture Y" questions, it is kind-of an XY-problem). I found this duplicate for `uint8` comparisons: https://stackoverflow.com/questions/32945410/sse2-intrinsics-comparing-unsigned-integers — chtz, May 26 '21 at 08:27
@chtz: Great, added that to the duplicates list, now this question should be fully dealt with for the benefit of future readers as well as the OP. AFAIK, SSE2 already has (for bytes) all the useful operation that SSE4.1 / AVX2 have for dwords. — Peter Cordes, May 26 '21 at 08:51
I went a little further messing with this, you can just do a `bitwise and` on the eq result and a set of 1s to get back 0s and 1s. But often you just want to use the bits to get back the first index that satisfies the condition so the return from `_mm*_eq(...)` so 0XFFFFFFF is fine (see godbolt below) https://godbolt.org/z/cahdv7Gxh — Steve Bronder, Feb 17 '23 at 23:14

Analog of _mm256_cmp_epu32_mask for AVX/AVX2

0 Answers0