I think it's better to use _mm256_cmp_ps
for your question. I have implemented the following program for this purpose. This is more than what you want. If you want to save ones you should set all mask
elements to 1
, but if you want to save another number you can change the mask value to whatever you want.
//gcc 6.2, Linux-mint, Skylake
#include <stdio.h>
#include <x86intrin.h>
float __attribute__(( aligned(32))) f[8] = {1.2, 0.5, 1.7, 1.9, 0.34, 22.9, 18.6, 1.0};
// float __attribute__(( aligned(32))) r[8]; // Must be {1, 0, 1, 1, 0, 1, 1, 0}
// in C++11, use alignas(32). Or C11 _Alignas(32), instead of GNU C __attribute__.
void printVecps(__m256 vec)
{
float tempps[8];
_mm256_store_ps(&tempps[0], vec);
printf(" [0]=%3.2f, [1]=%3.2f, [2]=%3.2f, [3]=%3.2f, [4]=%3.2f, [5]=%3.2f, [6]=%3.2f, [7]=%3.2f \n",
tempps[0],tempps[1],tempps[2],tempps[3],tempps[4],tempps[5],tempps[6],tempps[7]) ;
}
int main()
{
__m256 mask = _mm256_set1_ps(1.0), vec1, vec2, vec3;
vec1 = _mm256_load_ps(&f[0]); printf("vec1 : ");printVecps(vec1); // load vector values from f[0]-f[7]
vec2 = _mm256_cmp_ps ( mask, vec1, _CMP_LT_OS /*0x1*/);
printf("vec2 : ");printVecps(vec2); // compare them to mask (less)
vec3 = _mm256_min_ps (vec2 , mask); printf("vec3 : ");printVecps(vec3); // select minimum from mask and compared results
return 0;
}
The output for mask = {1,1,1,1,1,1,1,1}
is :
vec1 : [0]=1.20, [1]=0.50, [2]=1.70, [3]=1.90, [4]=0.34, [5]=22.90, [6]=18.60, [7]=1.00
vec2 : [0]=-nan, [1]=0.00, [2]=-nan, [3]=-nan, [4]=0.00, [5]=-nan, [6]=-nan, [7]=0.00
vec3 : [0]=1.00, [1]=0.00, [2]=1.00, [3]=1.00, [4]=0.00, [5]=1.00, [6]=1.00, [7]=0.00
And for mask = {2,2,2,2,2,2,2,2}
is :
vec1 : [0]=1.20, [1]=0.50, [2]=1.70, [3]=1.90, [4]=0.34, [5]=22.90, [6]=18.60, [7]=1.00
vec2 : [0]=0.00, [1]=0.00, [2]=0.00, [3]=0.00, [4]=0.00, [5]=-nan, [6]=-nan, [7]=0.00
vec3 : [0]=0.00, [1]=0.00, [2]=0.00, [3]=0.00, [4]=0.00, [5]=2.00, [6]=2.00, [7]=0.00
This depends on the non-commutative behaviour of _mm256_min_ps
with NaNs to replace the NaN elements with 1.0. NaN > 1.0 : NaN : 1.0
= 1.0
, because NaN > anything
is always false.
Beware that gcc before 7.0 treats the 128b _mm_min_ps
intrinsic as commutative even without -ffast-math
(even though it knows the minps
instruction isn't). Use an up-to-date gcc, or make sure that gcc chooses to compile your code with the operands in the order needed by this algorithm. (Or use clang). It's possible that gcc won't ever swap the operands with AVX, only with SSE (to avoid extra movapd
instructions), but the safest thing is to use gcc7 or later.