24

I am trying to compile a C program using cmake which uses SIMD intrinsics. When I try to compile it, I get two errors

/usr/lib/gcc/x86_64-linux-gnu/5/include/smmintrin.h:326:1: error: inlining failed in call to always_inline ‘_mm_mullo_epi32’: target specific option mismatch _mm_mullo_epi32 (__m128i __X, __m128i __Y)

/usr/lib/gcc/x86_64-linux-gnu/5/include/tmmintrin.h:136:1: error: inlining failed in call to always_inline ‘_mm_shuffle_epi8’: target specific option mismatch _mm_shuffle_epi8 (__m128i __X, __m128i __Y)

This issue has already been solved here StackOverflow by setting

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -msse4.1")

I try the very same and many other options. But my project still fails to compile.

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -msse4.1")  
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -sse4_1")  
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=nehalem")  
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -msse4.1 -msse4.2")  
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=native")  
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -ssse3")  
Community
  • 1
  • 1
Lawan subba
  • 610
  • 3
  • 7
  • 19

2 Answers2

21

A general method to find the instruction switch for gcc

File intrin.sh:

#!/bin/bash

get_instruction ()
{
    [ -z "$1" ] && exit
    func_name="$1 "

    header_file=`grep --include=\*intrin.h -Rl "$func_name" /usr/lib/gcc | head -n1`
    [ -z "$header_file" ] && exit
    >&2 echo "find in: $header_file"

    target_directive=`grep "#pragma GCC target(\|$func_name" $header_file | grep -B 1 "$func_name" | head -n1`
    echo $target_directive | grep -o '"[^,]*[,"]' | sed 's/"//g' | sed 's/,//g'
}

instruction=`get_instruction $1`
if [ -z "$instruction" ]; then
    echo "Error: function not found: $1"
else
    echo "add this option to gcc: -m$instruction"
fi

Usage:

./intrin.sh _mm_shuffle_epi8      # output: -mssse3
./intrin.sh _mm_cvtepu8_epi32     # output: -msse4.1
./intrin.sh _mm_loadu_ps          # output: -msse
./intrin.sh _mm_clmulepi64_si128  # output: -mpclmul
./intrin.sh _mm256_loadu_si256    # output: -mavx
./intrin.sh _mm512_and_ps         # output: -mavx512dq
Pamela
  • 549
  • 4
  • 7
  • 2
    Note that it's usually a good idea to use something like `-march=haswell`, not just `-mavx2 -mfma`. Or at least add `-mtune=znver2` (Zen 2) or something onto your `-m` ISA options. The "generic" tuning can be pretty poor for possibly-unaligned 256-bit vectors, especially when your data is usually aligned at runtime but the compiler just doesn't know that. See [Why doesn't gcc resolve \_mm256\_loadu\_pd as single vmovupd?](https://stackoverflow.com/q/52626726). Or if you want to make a binary for your own machine, `-march=native`. – Peter Cordes Sep 03 '20 at 01:35
  • Excellent answer! – f10w Mar 02 '21 at 12:08
18

Since you are compiling C code, not C++, you need:

set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -msse4.1")

You can get rid of all the other -march XXX and -msseXXX settings.

If you're using a mix of C and C++ then you could also add:

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -msse4.1")
Paul R
  • 208,748
  • 37
  • 389
  • 560
  • 2
    I had to add also -maes or ti did not work for me set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -msse4.1 -maes") – Rostfrei Jun 26 '18 at 12:46
  • 5
    Or better, use `-march=native` if compiling for your own machine. That will enable everything your CPU has, and set tuning options. – Peter Cordes Jul 23 '20 at 14:49