My question is about the most efficient place to define __m128
/__m128i
compile time constants in intrinsics based code.
Considering two options:
Option A
__m128i Foo::DoMasking(const __m128i value) const
{
//defined in method
const __m128i mask = _mm_set1_epi32(0x00FF0000);
return _mm_and_si128(value, mask);
}
Option B
//Foo.h
const __m128i mask = _mm_set1_epi32(0x00FF0000);
//Foo.cpp
__m128i Foo::DoMasking(const __m128i value) const
{
return _mm_and_si128(value, mask);
}
- Will option A incur a performance penalty, or will it be optimized away to an equivalent of option B?
- Is there a better yet option C?
- does the answer change depending on whether or not the method is inlined?
- Is
_mm_set1_epi32
/__mm_set_epi32
the best way to load the constants? I've seen some questions in which anint[4]
is generated and cast to an__m128i
.
I know the appropriate answer to all of these questions is "check the disassembly!", but I'm inexperienced in both generating it and interpreting it.
I am compiling on MSVC with maximum optimization.