18

I am trying to compile this project from github which is implemented in C++ with SIMD intrinsic (SSE4.1). The project in github is given as a Visual Studio solution, but I am trying to port it in Qtcreator with cmake. While I am trying to compile it I get the following error:

/usr/lib/gcc/x86_64-unknown-linux-gnu/5.3.0/include/smmintrin.h:520:1: error: inlining failed in call to always_inline '__m128i _mm_cvtepu8_epi32(__m128i)': target specific option mismatch
 _mm_cvtepu8_epi32 (__m128i __X)

which I am sure it has to do with the SSE optimization part, but since I am not that familiar with this subject I do not really know what it means and how I can solve it and in the net that I searched I couldn't really get something useful. The code that gives the following problem is the following:

static void cvt8u32f(const Mat& src, Mat& dest, const float amp)
{
    const int imsize = src.size().area()/8;
    const int nn = src.size().area()- imsize*8 ;
    uchar* s = (uchar*)src.ptr(0);
    float* d = dest.ptr<float>(0);
    const __m128 mamp = _mm_set_ps1(amp);
    const __m128i zero = _mm_setzero_si128();
    for(int i=imsize;i--;)
    {
        __m128i s1 = _mm_loadl_epi64((__m128i*)s);

        _mm_store_ps(d,_mm_mul_ps(mamp,_mm_cvtepi32_ps(_mm_cvtepu8_epi32(s1))));
        _mm_store_ps(d+4,_mm_mul_ps(mamp,_mm_cvtepi32_ps(_mm_cvtepu8_epi32(_mm_srli_si128(s1,4)))));
        s+=8;
        d+=8;
    }
    for(int i=0;i<nn;i++)
    {
        *d = (float)*s * amp;
        s++,d++;
    }

}

can someone explain me what is the issue and what I am missing. Thanks in advance.

ttsesm
  • 917
  • 5
  • 14
  • 28
  • 4
    *target specific option mismatch* seems to indicate that your (default?) compile target doesn't support SSE4.1. Perhaps [an `-mxxx` parameter](http://stackoverflow.com/questions/10686638/whats-the-differrence-among-cflgs-sse-options-of-msse-msse2-mssse3-msse4) can persuade the compiler? – Bo Persson Mar 03 '16 at 12:50
  • 5
    Yup, use `-msse4.1` for `pmovzx`. That's the usual message for intrinsics that you haven't told the compiler the target supports. That also tells the compiler it can use up to sse4.1 when auto-vectorizing. If that's a problem (runtime CPU dispatching), then use separate compilation units. Also `-march=nehalem` would enable SSE4.2 support, and `-mpopcnt`. – Peter Cordes Mar 03 '16 at 13:01
  • 2
    thanks both indeed adding `set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -msse4.1")` in the cmakelist seems that did the trick. Thanks again. – ttsesm Mar 03 '16 at 14:30
  • FWIW for me it maybe meant "make it not include xmmintrin.h file on accident" – rogerdpack Jan 24 '19 at 05:45
  • Also, be aware that this message means your CPU might not support the instuctions, in which case you can still try to compile with -msse4.1 but you might not be able to run it afterwards. – Romeo Valentin May 11 '20 at 10:38
  • using -mavx did it for me – Pawan Nirpal Jan 09 '23 at 08:38

1 Answers1

6

add in file.pro: QMAKE_CXXFLAGS +=-msse3

Olga
  • 61
  • 1
  • 2