I am having trouble wrapping my mind around which bits need to be set for masking using _mm256_maskload_ps
.
The documentation states that the mask is the "integer value calculated based on the most-significant-bit of each doubleword of a mask register"
Parsing this out, I think that there are 4 64 bit integers. I want to mask 8 values so I can think of this as 8 32 bit integers (this is where my understanding gets shaky) each of which has a MSB reserved for sign, 1 being negative and 0 being positive. So I could set -1 for "please load this" and 0 for "dont load this" for 8 32 bit integers and my mask should be correct. However, we actually have 4 64 bit integers so maybe I have to pack them?
Essentially I'm looking for a way to describe a mask such that 1,2,3...8 of the first elements are set when i do _mm256_maskload_ps
Note:
What's interesting is that when my mask is {-1, 0, 0, 0}
the first 2 elements get set. when my mask is {0xFFFFFFFF, 0, 0, 0}
only the first element gets set.
#include <iostream>
#include <immintrin.h>
#include <string>
using namespace std;
int main()
{
float a[3] {1,2,3};
float b[3] {11, 22, 33};
auto disp = [](float *arr) {
cout << "[";
string sep;
for (size_t i = 0; i < 3; i++)
{
cout << sep << arr[i];
sep = ", ";
}
cout << "]";
cout << endl;
};
disp(a);
disp(b);
__m256 _a, _b;
__m256i _load_mask = {-1, 0, 0, 0};
_a = _mm256_maskload_ps(a, _load_mask);
_b = _mm256_maskload_ps(b, _load_mask);
_a = _mm256_add_ps(_a, _b);
float c[8];
_mm256_storeu_ps(c, _a);
disp(c);
return 0;
}
Displays
[1, 2, 3]
[11, 22, 33]
[12, 24, 0]
when compiled with
!clang++ -mavx -Wall -Wextra -std=c++17 -stdlib=libc++ -ggdb % -o $(basename -s .cpp %
on my mac, where %
is the filename