I think the vpshufb instruction would work well for something that I'm trying to do, but I don't know how to use the shuffle control mask to control where parts of the vector are shuffled, and I cannot find information on how to do this on the internet.
Intrinsic Version of the Instruction: _mm256_shuffle_epi8(__m256i a, __m256i b)