I was studying the expand and compress operations from the Intel intrinsics guide. I'm confused about these two concepts:
For __m128d _mm_mask_expand_pd (__m128d src, __mmask8 k, __m128d a) == vexpandpd
Load contiguous active double-precision (64-bit) floating-point elements from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
For __m128d _mm_mask_compress_pd (__m128d src, __mmask8 k, __m128d a) == vcompresspd
Contiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
Is there any clearer description or anyone who can explain more?