I have the following part in my asm assembly code
"LOOP%=:\n\t"
"movapd (%%eax), %%xmm4\n\t"
"addl $32, %%eax\n\t"
"movsd (%%edx), %%xmm5\n\t"
"addl $16, %%edx\n\t"
"movapd %%xmm4, %%xmm6\n\t"
"subl $1, %%ecx\n\t"
"unpcklpd %%xmm5, %%xmm5\n\t"
"testl %%ecx, %%ecx\n\t"
"mulpd %%xmm5, %%xmm6\n\t"
"movsd -8(%%edx), %%xmm7\n\t"
"addpd %%xmm6, %%xmm0\n\t"
"movapd -16(%%eax), %%xmm6\n\t"
"unpcklpd %%xmm7, %%xmm7\n\t"
"mulpd %%xmm6, %%xmm5\n\t"
"addpd %%xmm5, %%xmm1\n\t"
"mulpd %%xmm7, %%xmm4\n\t"
"addpd %%xmm4, %%xmm2\n\t"
"mulpd %%xmm6, %%xmm7\n\t"
"addpd %%xmm7, %%xmm3\n\t"
"jne LOOP%=\n\t" */
This code holds in %ecx a "loop index", while scanning two (double *) arrays A and B performing some computation using SSE2. Both arrays have been aligned to 64Bytes (aligned to cache line so the 16Byte alignment requirement of SSE is satisfied). %eax holds a "pointer" to array A and "edx" holds a "pointer" to array B. It runs correctly and there is no memory read error. I am wondering why do I have to do
"movapd (%%eax), %%xmm4\n\t"
"addl $32, %%eax\n\t"
"movsd (%%edx), %%xmm5\n\t"
"addl $16, %%edx\n\t"
......
"movsd -8(%%edx), %%xmm7\n\t"
......
"movapd -16(%%eax), %%xmm6\n\t"
......
So I change the initial version to
"LOOP%=:\n\t"
"movapd (%%eax), %%xmm4\n\t"
"movsd (%%edx), %%xmm5\n\t"
"movapd %%xmm4, %%xmm6\n\t"
"subl $1, %%ecx\n\t"
"unpcklpd %%xmm5, %%xmm5\n\t"
"testl %%ecx, %%ecx\n\t"
"mulpd %%xmm5, %%xmm6\n\t"
"movsd 8(%%edx), %%xmm7\n\t"
"addl $16, %%edx\n\t"
"addpd %%xmm6, %%xmm0\n\t"
"movapd 16(%%eax), %%xmm6\n\t"
"addl $32, %%eax\n\t"
"unpcklpd %%xmm7, %%xmm7\n\t"
"mulpd %%xmm6, %%xmm5\n\t"
"addpd %%xmm5, %%xmm1\n\t"
"mulpd %%xmm7, %%xmm4\n\t"
"addpd %%xmm4, %%xmm2\n\t"
"mulpd %%xmm6, %%xmm7\n\t"
"addpd %%xmm7, %%xmm3\n\t"
"jne LOOP%=\n\t"
But then I suffer from a segmentation fault for invalid read.
It appears funny to me. Why?