I wrote a code to add two arrays using KNC instructions with (512bit long vectors) on Xeon Phi intel coprocessor. However I've got segmentation part in the inline assembly part.
Here it is my code:
int main(int argc, char* argv[])
{
int i;
const int length = 65536;
const int AVXLength = length / 16;
float *A = (float*) aligned_malloc(length * sizeof(float), 64);
float *B = (float*) aligned_malloc(length * sizeof(float), 64);
float *C = (float*) aligned_malloc(length * sizeof(float), 64);
for(i=0; i<length; i++){
A[i] = 1;
B[i] = 2;
}
float * pA = A;
float * pB = B;
float * pC = C;
for(i=0; i<AVXLength; i++ ){
__asm__("vmovaps %1,%%zmm0\n"
"vmovaps %2,%%zmm1\n"
"vaddps %%zmm0,%%zmm0,%%zmm1\n"
"vmovaps %%zmm0,%0;"
: "=m" (pC) : "m" (pA), "m" (pB));
pA += 512;
pB += 512;
pC += 512;
}
return 0;
}
I am using gcc as a compiler (because I don't have money to buy intel compiler). And this is my command line to compile this code:
k1om-mpss-linux-gcc add.c -o add.out
The problem was in the inline assembly. The following inline assembly fixed it.
__asm__("vmovaps %1,%%zmm1\n"
"vmovaps %2,%%zmm2\n"
"vaddps %%zmm1,%%zmm2,%%zmm3\n"
"vmovaps %%zmm3,%0;"
: "=m" (*pC) : "m" (*pA), "m" (*pB));