Making my homework for implementing Conway's Game of Life using intrinsic functions found the working code, but cannot understand the main part of it.
This implementation first calculates amount of alive neighbors for each sell and store the result in an array counts
, so the array of sells (world) is states
. I cannot really get how the newstate
is generated here. I understand how left shift works, how bitwise OR works, but I cannot understand why they're used like this, why shufmask
is like this and how the shuffle works. Also cannot understand why _mm256_slli_epi16 used if the type of array elements is uint8_t. So my question is all about this string
__m256i newstate = _mm256_shuffle_epi8(shufmask, _mm256_or_si256(c, _mm256_slli_epi16(oldstate, 3)));
Could you please explain for me, dummy boy, if it's possible maximum detailed, how it works.
void gameoflife8vec(uint8_t *counts, uint8_t *states, size_t width, size_t height) {
assert(width % (sizeof(__m256i)) == 0);
size_t awidth = width + 2;
computecounts8vec(counts, states, width, height);
__m256i shufmask =
_mm256_set_epi8(
0, 0, 0, 0, 0, 1, 1, 0,
0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 1, 1, 0,
0, 0, 0, 0, 0, 1, 0, 0
);
for (size_t i = 0; i < height; i++) {
for (size_t j = 0; j < width; j += sizeof(__m256i)) {
__m256i c = _mm256_lddqu_si256(
(const __m256i *)(counts + (i + 1) * awidth + j + 1));
c = _mm256_subs_epu8(
c, _mm256_set1_epi8(
1)); // max was 8 = 0b1000, make it 7, 1 becomes 0, 0 remains 0
__m256i oldstate = _mm256_lddqu_si256(
(const __m256i *)(states + (i + 1) * awidth + j + 1));
__m256i newstate = _mm256_shuffle_epi8(
shufmask, _mm256_or_si256(c, _mm256_slli_epi16(oldstate, 3)));
_mm256_storeu_si256((__m256i *)(states + (i + 1) * awidth + (j + 1)),
newstate);
}
}
}
The memory for array is allocated in this way
uint8_t *states = (uint8_t *)malloc((N + 2) * (N + 2) * sizeof(uint8_t));
uint8_t *counts = (uint8_t *)malloc((N + 2) * (N + 2) * sizeof(uint8_t));
Also the source code can be found here https://github.com/lemire/SIMDgameoflife