0

I am using SSE to implement matrix multiply, and the following codes are 2 arrays multiplication.

The following are C++ codes without SSE instructions.

 void ComputeArrayCPlusPlus(
          float* pArray1,                   // [in] first source array
          float* pArray2,                   // [in] second source array
          float* pResult,                   // [out] result array
          int nSize)                        // [in] size of all arrays
{

    int i;

    float* pSource1 = pArray1;
    float* pSource2 = pArray2;
    float* pDest = pResult;

    for ( i = 0; i < nSize; i++ )
    {
        *pDest = (float)((*pSource1) * (*pSource1) + (*pSource2) * (*pSource2));

        pSource1++;
        pSource2++;
        pDest++;
    }
}

The following are my test codes with SSE instructions.

void ComputeArrayCPlusPlusSSE(
          float* pArray1,                   // [in] first source array
          float* pArray2,                   // [in] second source array
          float* pResult,                   // [out] result array
          int nSize)                        // [in] size of all arrays
{

    int nLoop = nSize/ 4;

    __m128 m1, m2, m3, m4;

    __m128* pSrc1 = (__m128*) pArray1;
    __m128* pSrc2 = (__m128*) pArray2;
    __m128* pDest = (__m128*) pResult;


    for ( int i = 0; i < nLoop; i++ )
    {
        m1 = _mm_mul_ps(*pSrc1, *pSrc1);        // m1 = *pSrc1 * *pSrc1
        m2 = _mm_mul_ps(*pSrc2, *pSrc2);        // m2 = *pSrc2 * *pSrc2
        m3 = _mm_add_ps(m1, m2);                // m3 = m1 + m2
        *pDest = _mm_sqrt_ps(m3);                   // m4 = sqrt(m3)

        pSrc1++;
        pSrc2++;
        pDest++;
    }
    for (int i = 0; i < 4; i ++)
    {
        cout << pResult[i] << endl;
    }
}

The following are my main function:

int main(int argc,char* argv[])
{
    float left[4] = {1, 1, 1, 1};
    float right[4] = {1, 1, 1, 1};
    float result[4] = {1, 1, 1, 1};
    ComputeArrayCPlusPlusSSE(left, right, result, 4);
    system("pause");
    return 0;

}

And when it runs, my Visual Studio 2012 report access conflict error in ComputeArrayCPlusPlusSSE(args) function at

m1 = _mm_mul_ps(*pSrc1, *pSrc1);        // m1 = *pSrc1 * *pSrc1
m2 = _mm_mul_ps(*pSrc2, *pSrc2);        // m2 = *pSrc2 * *pSrc2

I don't know why, there is no syntax error with my code and the arrays have been initialized and __m128 data have also been initialized. Hope some can help me out, thanks in advance.

GoingMyWay
  • 16,802
  • 32
  • 96
  • 149
  • 4
    You might need to load your floats into a register first using `_mm_loadu_ps` instead of casting the `float *` to `__mm128 *`. – 1201ProgramAlarm Dec 30 '15 at 05:41
  • 3
    If you get an error when *running* the compiled program, then it's likely an alignment error. `__m128` values need to be 16 byte aligned. That means you need to allocate your float arrays with 16 byte alignment. – Ross Ridge Dec 30 '15 at 05:55
  • [Please stop using implicit loads/stores](http://stackoverflow.com/questions/34440850/segmentation-fault-with-array-of-m256i-when-using-clang-g/34499562#34499562). – Z boson Dec 30 '15 at 08:37
  • @1201ProgramAlarm, thank you, it helps, and then use `_mm_stream_ps` to store the data from cache to memory. – GoingMyWay Dec 30 '15 at 11:10

0 Answers0