I am studying AVX-512. I have a question about VORPS.
The documentation says like this:
EVEX.512.0F.W0 56 /r VORPS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst
Return the bitwise logical OR of packed single-precision floating-point values in zmm2 and zmm3/m512/m32bcst subject to writemask k1.
EVEX encoded versions: The first source operand is a ZMM/YMM/XMM register. The second source operand can be a ZMM/YMM/XMM register, a 512/256/128-bit memory location, or a 512/256/128-bit vector broadcasted from a 32-bit memory location. The destination operand is a ZMM/YMM/XMM register conditionally updated with writemask k1.
Ref: https://www.felixcloutier.com/x86/orps
What does "subject to writemask k1" mean?
Can anyone give a concrete example of k1 contribution in this instruction?
I wrote this code to do some experiment about VORPS: https://godbolt.org/z/fMcqoa
Code
#include <stdio.h>
#include <stddef.h>
#include <stdint.h>
int main()
{
register uint8_t *st_data asm("rbx");
asm volatile(
// Fix stack alignment
"andq $~0x3f, %%rsp\n\t"
// Allocate stack
"subq $0x100, %%rsp\n\t"
// Take stack pointer, save it to st_data
"movq %%rsp, %[st_data]\n\t"
// Fill 64 bytes top of stack with 0x01
"movq %%rsp, %%rdi\n\t"
"movl $0x40, %%ecx\n\t"
"movl $0x1, %%eax\n\t"
"rep stosb\n\t"
// Fill 64 bytes next with 0x02
"incl %%eax\n\t"
"leaq 0x40(%%rsp), %%rdi\n\t"
"movl $0x40, %%ecx\n\t"
"rep stosb\n\t"
// Take 0x1 and 0x2 to ZMM register
"vmovdqa64 (%%rsp), %%zmm0\n\t"
"vmovdqa64 0x40(%%rsp), %%zmm1\n\t"
// Set write mask
"movq $0x123456, %%rax\n\t"
"kmovq %%rax, %%k0\n\t"
"kmovq %%rax, %%k1\n\t"
"kmovq %%rax, %%k2\n\t"
// Execute vorps, store the result to ZMM2
"vorps %%zmm0, %%zmm1, %%zmm2\n\t"
// Plug back the result to memory
"vmovdqa64 %%zmm2, 0x80(%%rsp)\n\t"
"vzeroupper"
: [st_data]"=r"(st_data)
:
: "rax", "rcx", "rdi", "zmm0", "zmm1",
"zmm2", "memory", "cc"
);
static const char *x[] = {
"Data 1:", "Data 2:", "Result:"
};
for (size_t i = 0; i < 3; i++) {
printf("%s\n", x[i]);
for (size_t j = 0; j < 8; j++) {
for (size_t k = 0; k < 8; k ++) {
printf("%02x ", *st_data++);
}
printf("\n");
}
printf("\n");
}
fflush(stdout);
asm volatile(
// sys_exit
"movl $0x3c, %eax\n\t"
"xorl %edi, %edi\n\t"
"syscall"
);
}
Here, I tried to change the value of k0, k1, k2. But the result is always the same.
Result:
03 03 03 03 03 03 03 03
03 03 03 03 03 03 03 03
03 03 03 03 03 03 03 03
03 03 03 03 03 03 03 03
03 03 03 03 03 03 03 03
03 03 03 03 03 03 03 03
03 03 03 03 03 03 03 03
03 03 03 03 03 03 03 03