I'm pretty new to AVX (and C!) and I'm trying to calculate the euclidean distance (squared) between two vectors and return a vector filled with 1 if the distance is less than some threshold and 0 if it is greater.
For instance, if the distances are [5.0, 6.0, 2.0, 1.0] and the threshold is 4.0, I would like the function to return [1.0, 1.0, 0.0, 0.0]. The code below is what I have so far (adapted a little from AVX2 float compare and get 0.0 or 1.0 instead of all-0 or all-one bits). It works but leaves a lot to desire.
__m256d diff, value, ta, tb, tc, tta, ttb, ttc, sum, comp, mask;
comp = _mm256_set1_pd(4.0); //this is what I want to compare the distance to
ta = _mm256_sub_pd(v1[0], v2[0]);
tb = _mm256_sub_pd(v1[1], v2[1]);
tc = _mm256_sub_pd(v1[2], v2[2]);
tta = _mm256_mul_pd(ta,ta); //(v1.x - v2.x)^2
ttb = _mm256_mul_pd(tb,tb); //(v1.y - v2.y)^2
ttc = _mm256_mul_pd(tc,tc); //(v1.z - v2.z)^2
sum = _mm256_add_pd(_mm256_add_pd(tta,ttb), ttc); //(v1.x - v2.x)^2 + (v1.y - v2.y)^2 + (v1.z - v2.z)^2
mask = _mm256_cmp_pd(sum, comp, _CMP_LE_OS); // will be NaN or 0
value = _mm256_div_pd(_mm256_min_pd(mask, comp), comp);
For a calculated distance of [5.0, 6.0, 2.0, 1.0], the _mm256_cmp_pd will return [4.0, 4.0, 0.0, 0.0] when compared to 4.0 (copied from the linked StackOverflow post), and then I divide by 4.0 to set it to 1.0. This obviously seems like a pretty hacky way to get what I want; is there an easier way to compare the "sum" and "comp" to gets 1's and 0's directly?