In the following code, I can get the result of mm0 - mm1 in mm0 by PSUBSW instruction. When I compiled on Mac book air by gcc.
But, PSUBSW instruction is explained that we can get the result of mm1 - mm0 in mm1 in Intel developer's manual: PSUBSW mm, mm/m64, Subtract signed packed words in mm/m64 from signed packed words in mm and saturate results.
#include <stdio.h>
int
main()
{
short int a[4] = {1111,1112,1113,1114};
short int b[4] = {1111,2112,3113,4114};
short int c[4];
asm volatile (
"movq (%1),%%mm0\n\t"
"movq (%2),%%mm1\n\t"
"psubsw %%mm1,%%mm0\n\t"
"movq %%mm0,%0\n\t"
"emms"
: "=g"(c): "r"(&a),"r"(&b));
printf("%d %d %d %d\n", c[0], c[1], c[2], c[3]);
return 0;
}
What is this difference? Which is the src, mm0 or mm1? If this difference is Intel syntax and AT&T syntax.