2

My question is about the most efficient place to define __m128/__m128i compile time constants in intrinsics based code.

Considering two options:

Option A

__m128i Foo::DoMasking(const __m128i value) const
{
    //defined in method
    const __m128i mask = _mm_set1_epi32(0x00FF0000);
    return _mm_and_si128(value, mask);
}

Option B

//Foo.h
const __m128i mask = _mm_set1_epi32(0x00FF0000);

//Foo.cpp
__m128i Foo::DoMasking(const __m128i value) const
{
    return _mm_and_si128(value, mask);
}
  • Will option A incur a performance penalty, or will it be optimized away to an equivalent of option B?
  • Is there a better yet option C?
  • does the answer change depending on whether or not the method is inlined?
  • Is _mm_set1_epi32/__mm_set_epi32 the best way to load the constants? I've seen some questions in which an int[4] is generated and cast to an __m128i.

I know the appropriate answer to all of these questions is "check the disassembly!", but I'm inexperienced in both generating it and interpreting it.

I am compiling on MSVC with maximum optimization.

Rotem
  • 21,452
  • 6
  • 62
  • 109

1 Answers1

2

Option A will probably be OK - the compiler should do the right thing when it inlines this function and it should hoist the mask constant out of any loops, but the safest option in my experience, particularly if you want this to work reliably across multiple platforms/compilers, is to re-factor this into a slightly less elegant but potentially more efficient form:

__m128i Foo::DoMasking(const __m128i value, const __m128i mask) const
{
    return _mm_and_si128(value, mask);
}

void Foo::DoLotsOfMasking(...)
{
    const __m128i mask = _mm_set1_epi32(0x00FF0000);

    for (int i = 0; ...; ...)
    {
        // ...
        v[i] = DoMasking(x[i], mask);
        // ...
    }
}
Paul R
  • 208,748
  • 37
  • 389
  • 560
  • 1
    thanks, I was hoping you'd answer this question :) Portability is not a concern in this case. If I understand you correctly, it is more efficient (or *as* efficient) to pass the constant to the function than to initialize it in the header or class? – Rotem Oct 10 '13 at 08:34
  • Well, there are no guarantees with this sort of thing, and different versions of the same compiler, or different compiler options, may result in different behaviour. I think either Option A or B should be OK, although I would prefer A, but my alternative is probably a safer bet, since the constant will typically be generated once, before the loop, and kept in a register. As always, profile your code! – Paul R Oct 10 '13 at 08:43
  • How do you handle SIMD Constant in `C`? While in `C++` one could do `static __m128 const a = _mm_set1_ps(12102203.2f);` and use `a` in the code, what should one do in `C`? – Royi Sep 02 '18 at 15:03
  • @PaulR, I tried explaining myself better at https://stackoverflow.com/questions/52139380. – Royi Sep 02 '18 at 17:41