1

I have some trouble with a "special" kind of conditional structure in SSE/C++. The following pseudo code illustrates what I want to do:

    for-loop ...
        // some SSE calculations
        __m128i a = ... // a contains four 32-bit ints
        __m128i b = ... // b contains four 32-bit ints

        if any of the four ints in a is less than its corresponding int in b
            vector.push_back(e.g. first component of a)

So I do quite a few SSE calculations and as the result of these calculations, I have two __m128i values, each containing four 32-bit integer. This part is working fine. But now I want to push something into a vector, if at least one of the four ints in a is less than the corresponding int in b. I have no idea how I can achieve this.

I know the _mm_cmplt_epi32 function, but so far I failed to use it to solve my specific problem.

EDIT:

Yeah, actually I'm searching for a clever solution. I have a solution, but that looks very, very strange.

for-loop ...
    // some SSE calculations
    __m128i a = ... // a contains four 32-bit ints
    __m128i b = ... // b contains four 32-bit ints

    long long i[2] __attribute__((aligned (16)));

    __m128i cmp = _mm_cmplt_epi32(a, b);
    _mm_store_si128(reinterpret_cast<__m128i*>(i), cmp);

       if(i[0] || i[1]) {
            vector.push_back(...)

I hope, there is a better way...

user1494080
  • 2,064
  • 2
  • 17
  • 36
  • can you think of no way to do this, or no "clever, compact" way? Just a lengthy byte operation (and with 0xFFFFFFFF, compare, act, right shift 32, repeat) would do it - are you after something better? – Floris Dec 06 '13 at 02:11
  • Yes, I'm searching for a clever way, I have edited my question... – user1494080 Dec 06 '13 at 02:21

2 Answers2

4

You want to use the _mm_movemask_ps function, which will return an appropriate bitmask which you can test:

cmp = _mm_cmplt_epi32(a, b);

if(_mm_movemask_ps(cmp))
{
    vector.push_back(...);
}

Documented here: http://msdn.microsoft.com/en-us/library/4490ys29%28v=vs.90%29.aspx

beerboy
  • 1,304
  • 12
  • 12
  • Thanks, that's the kind of function I was searching for. But I think in my context, _mm_movemask_epi8 is somewhat more suitable, as I'm operation on ints (__m128i) and not floats (__m128). – user1494080 Dec 06 '13 at 17:03
  • @user1494080, you can use `_mm_movemask_ps` as well probably without any penatly. But there is a subtle reason to stay in the same execution unit in some cases. See this http://stackoverflow.com/questions/19543590/bypass-delays-when-switching-execution-unit-domains – Z boson Dec 06 '13 at 17:44
1

I did something similar to this to find prime numbers Finding lists of prime numbers with SIMD - SSE/AVX

This is only going to be useful with SSE if the result of the comparison is false most of the time. Otherwise you should just use scalar code. Let me try and lay out the code.

__m128i cmp = _mm_cmplt_epi32(a, b);
if(_mm_movemask_epi8(cmp)) {
    int out[4] __attribute__((aligned (16)));
    _mm_store_si128(out, _mm_and_si128(out, a));
    for(int i=0; i<4; i++) if(out[i]) vector.push_back(out[i]);

}

You could store the comparison instead of using the logical and. Additionally, you could mask the bytes in the move mask and skip the store. Either way you do it what really matters is that the movemask is zero most of the time otherwise SSE won't be helpful.

In my case a was a list of numbers I wanted to test to be prime and b was a list of divisors. Since I knew that most of the time the values of a were not prime this gave me a boost of about 3x (out of max 4x with SSE).

Community
  • 1
  • 1
Z boson
  • 32,619
  • 11
  • 123
  • 226