I have a loop that loads two float*
arrays into __m256
vectors and processes them. Following this loop, I have code that loads the balance of values into the vectors and then processes them. So there is no alignment requirement on the function.
Here is the code that loads the balance of the data into the vectors:
size_t constexpr FLOATS_IN_M128 = sizeof(__m128) / sizeof(float);
size_t constexpr FLOATS_IN_M256 = FLOATS_IN_M128 * 2;
...
assert(bal < FLOATS_IN_M256);
float ary[FLOATS_IN_M256 * 2];
auto v256f_q = _mm256_setzero_ps();
_mm256_storeu_ps(ary, v256f_q);
_mm256_storeu_ps(&ary[FLOATS_IN_M256], v256f_q);
float *dest = ary;
size_t offset{};
while (bal--)
{
dest[offset] = p_q_n[pos];
dest[offset + FLOATS_IN_M256] = p_val_n[pos];
offset++;
pos++;
}
// the two vectors that will be processed
v256f_q = _mm256_loadu_ps(ary);
v256f_val = _mm256_loadu_ps(&ary[FLOATS_IN_M256]);
When I use Compiler Explorer, set to "x86-64 clang 16.0.0 -march=x86-64-v3 -O3" the compiler unrolls the loop when the assert(bal < FLOATS_IN_M256);
line is present. However, assert()
is ignored in RELEASE
mode, meaning the loop won't be vectorized and unrolled.
To test, I defined NDEBUG
and the loop is vectorized and unrolled.
I have tried adding the following in the appropriate places, but they don't work:
#pragma clang loop vectorize(enable)
#pragma unroll
#undef NDEBUG
The compiler should be able to see from the code before the snippet above that bal < 8
but it doesn't. How can I tell it this assertion is true when not in DEBUG
mode?