I am using SSE
to implement matrix multiply, and the following codes are 2 arrays multiplication.
The following are C++
codes without SSE
instructions.
void ComputeArrayCPlusPlus(
float* pArray1, // [in] first source array
float* pArray2, // [in] second source array
float* pResult, // [out] result array
int nSize) // [in] size of all arrays
{
int i;
float* pSource1 = pArray1;
float* pSource2 = pArray2;
float* pDest = pResult;
for ( i = 0; i < nSize; i++ )
{
*pDest = (float)((*pSource1) * (*pSource1) + (*pSource2) * (*pSource2));
pSource1++;
pSource2++;
pDest++;
}
}
The following are my test codes with SSE
instructions.
void ComputeArrayCPlusPlusSSE(
float* pArray1, // [in] first source array
float* pArray2, // [in] second source array
float* pResult, // [out] result array
int nSize) // [in] size of all arrays
{
int nLoop = nSize/ 4;
__m128 m1, m2, m3, m4;
__m128* pSrc1 = (__m128*) pArray1;
__m128* pSrc2 = (__m128*) pArray2;
__m128* pDest = (__m128*) pResult;
for ( int i = 0; i < nLoop; i++ )
{
m1 = _mm_mul_ps(*pSrc1, *pSrc1); // m1 = *pSrc1 * *pSrc1
m2 = _mm_mul_ps(*pSrc2, *pSrc2); // m2 = *pSrc2 * *pSrc2
m3 = _mm_add_ps(m1, m2); // m3 = m1 + m2
*pDest = _mm_sqrt_ps(m3); // m4 = sqrt(m3)
pSrc1++;
pSrc2++;
pDest++;
}
for (int i = 0; i < 4; i ++)
{
cout << pResult[i] << endl;
}
}
The following are my main function:
int main(int argc,char* argv[])
{
float left[4] = {1, 1, 1, 1};
float right[4] = {1, 1, 1, 1};
float result[4] = {1, 1, 1, 1};
ComputeArrayCPlusPlusSSE(left, right, result, 4);
system("pause");
return 0;
}
And when it runs, my Visual Studio 2012
report access conflict error
in ComputeArrayCPlusPlusSSE(args)
function at
m1 = _mm_mul_ps(*pSrc1, *pSrc1); // m1 = *pSrc1 * *pSrc1
m2 = _mm_mul_ps(*pSrc2, *pSrc2); // m2 = *pSrc2 * *pSrc2
I don't know why, there is no syntax error with my code and the arrays have been initialized and __m128
data have also been initialized. Hope some can help me out, thanks in advance.