AVX allow for bitwise logical operations such as and/or on floating point data-type __m256 and __m256d.
However, C++ doesn't allow for bitwise operations on floats and doubles, reasonably. If I'm right, there's no guarantee on the internal representation of floats, whether the compiler will use IEEE754 or not, hence a programmer can't be sure about how the bits of a float will look like.
Consider this example:
#include <immintrin.h>
#include <iostream>
#include <limits>
#include <cassert>
int main() {
float x[8] = {1,2,3,4,5,6,7,8};
float mask[8] = {-1,0,0,-1,0,-1,0,0};
float x_masked[8];
assert(std::numeric_limits<float>::is_iec559);
__m256 x_ = _mm256_load_ps(x);
__m256 mask_ = _mm256_load_ps(mask);
__m256 x_masked_ = _mm256_and_ps(x_,mask_);
_mm256_store_ps(x_masked,x_masked_);
for(int i = 0; i < 8; i++)
std::cout << x_masked[i] << " ";
return 0;
}
Assuming that IEEE754 is used, as the representation of -1 is 0xffffffff, I would expect the output to be
1,0,0,4,0,6,0,0
while it's instead
1 0 0 1.17549e-38 0 1.17549e-38 0 0
Hence my assumption about the internal representation was probably wrong (or I made some silly mistake).
So the question is: is there a way in which I can use floating point logical and be safe about the fact that the result will make sense?