1

I would like to create an SSE register with values that I can store in an array of integers, from another SSE register which contains flags 0xFFFF and zeros. For example:

__m128i regComp = _mm_cmpgt_epi16(regA, regB);

For the sake of argument, lets assume that regComp was loaded with { 0, 0xFFFF, 0, 0xFFFF }. I would like to convert this into say { 0, 80, 0, 80 }.

What I had in mind was to create an array of integers, initialized to 80 and load them to a register regC. Then, do a _mm_and_si128 bewteen regC and regComp and store the result in regD. However, this does not do the trick, which led me to think that I do not understand the positive flags in SSE registers. Could someone answer the question with a brief explanation why my solution does not work?

short valA[16] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 };
short valB[16] = { 5, 5, 5, 5, 5, 5, 5, 5, 5, 10, 10, 10, 10, 10, 10, 10 };
short ones[16] = { 1 };
short final[16];

__m128i vA, vB, vOnes, vRes, vRes2;

vOnes = _mm_load_si128((__m128i *)&(ones)[0] );

for( i=0 ; i < 16 ;i+=8){
   vA = _mm_load_si128((__m128i *)&(valA)[i] );
   vB = _mm_load_si128((__m128i *)&(valB)[i] );

   vRes = _mm_cmpgt_epi16(vA,vB);

   vRes2 = _mm_and_si128(vRes,vOnes);
   _mm_storeu_si128((__m128i *)&(final)[i], vRes2);
 }
Mysticial
  • 464,885
  • 45
  • 335
  • 332
a3mlord
  • 1,060
  • 6
  • 16
  • Actually, this works. Could you post a complete code? – galinette Jun 22 '15 at 08:41
  • If your variables / arrays already have `__m128*` type, you can just use them directly. No need to take their address and do a `_mm_load`. The load intrinsics are mostly there to avoid ugly casts from pointers to scalar types. (Also if you aren't using AVX, and need to load unaligned data, then you need the `loadu` intrinsics.) – Peter Cordes Jul 02 '15 at 22:55

2 Answers2

2

You only set the first element of array ones to 1 (the rest of the array is initialised to 0).

I suggest you get rid of the array ones altogether and then change this line:

vOnes = _mm_load_si128((__m128i *)&(ones)[0] );

to:

vOnes = _mm_set1_epi16(1);

Probably a better solution though, if you just want to convert SIMD TRUE (0xffff) results to 1, would be to use a shift:

for (i = 0; i < 16; i += 8) {
    vA = _mm_loadu_si128((__m128i *)&pA[i]);
    vB = _mm_loadu_si128((__m128i *)&pB[i]);

    vRes = _mm_cmpgt_epi16(vA, vB);    // generate 0xffff/0x0000 results

    vRes = _mm_srli_epi16(vRes, 15);   // convert to 1/0 results

    _mm_storeu_si128((__m128i *)&final[i], vRes2);
}
Paul R
  • 208,748
  • 37
  • 389
  • 560
  • That was my next question... what would be the best solution to this. I assume that the shift is better. Thanks! – a3mlord Jun 22 '15 at 08:57
  • Using a shift may be slightly more efficient (it uses one less register and no additional data). In practice it may make no difference, but I'd still prefer it. – Paul R Jun 22 '15 at 08:59
  • Right. Thanks Paul. BTW, I might have now questions whether someone can come up with a better solution for specific parts of my code (not this kernel). Is this acceptable to ask in SO? – a3mlord Jun 22 '15 at 09:04
  • According to http://stackoverflow.com/questions/201101/how-to-initialize-an-array-in-c my = { 0 } would initialize all the elements to 0. Why doesn't this work? – a3mlord Jun 22 '15 at 09:16
  • 1
    When you initialise one or more elements explicitly, any remaining elements are initialised to 0. So `= { 0 }` works when you want all elements initialised to 0, but you can't use `= { 1 }` to initialise all elements to 1 (it's effectively just `= { 1, 0, 0, 0, ... }`. – Paul R Jun 22 '15 at 10:38
  • And yes, if you have further questions related to your code then go ahead and post them *as new questions* here on StackOverflow. – Paul R Jun 22 '15 at 10:40
  • this is too simple to open a new question: what is the most logical way to verify if *any* of the register positions in your instruction `vRes = _mm_cmpgt_epi16(vA, vB);` did return `0xFFFF`? Thanks. – a3mlord Jun 23 '15 at 14:00
  • For this type of thing you'd usually use `_mm_movemask_epi8`. – Paul R Jun 23 '15 at 14:37
  • This way: `if ( _mm_movemask_epi8( _mm_cmpgt_epi16(vA, vB) ) ){...}`? – a3mlord Jun 23 '15 at 14:56
  • Sure - if any element is true then the mask will be non-zero. – Paul R Jun 23 '15 at 15:54
1

Try this for loading 1:

vOnes = _mm_set1_epi16(1);

This is shorter than creating a constant array.

Be careful, providing less array values than array size in C++ initializes the other values to zero. This was your error, and not the SSE part.

Don't forget the debugger, modern ones display SSE variables properly.

galinette
  • 8,896
  • 2
  • 36
  • 87
  • According to http://stackoverflow.com/questions/201101/how-to-initialize-an-array-in-c my = { 0 } would initialize all the elements to 0. Why doesn't this work? – a3mlord Jun 22 '15 at 09:16
  • 1
    @a3mlord because "elements with missing values will be initialized to 0", it is not the case that all elements will get a copy of the value you provide, that's only incidentally the case when that element happens to be zero. – harold Jun 22 '15 at 10:20