1

I tried to use SSE to do 4 pixels operation. I have problem in loading the image data to __m128. My image data is a char buffer. Let say my image is 1024 x1024. My filter is 16x16.

__m128 IMG_VALUES, FIL_VALUES, NEW_VALUES;
//ok:
IMG_VALUES=_mm_load_ps(&pInput[0]);
//hang below:
IMG_VALUES=_mm_load_ps(&pInput[1]);

I dont know how to handle index 1,2,3... thanks.

Paul R
  • 208,748
  • 37
  • 389
  • 560
manhon
  • 683
  • 7
  • 27

1 Answers1

2

If you really need to do this with floating point rather then integer/fixed point then you will need to load your 8 bit data, unpack to 32 bits (requires two operations: 8 bit to 16 bit, then 16 bit to 32 bit), then convert to float. This is horribly inefficient though, and you should look at doing this with e.g. 16 bit fixed point operations.

Note that for each 16 pixel load you will then have 4 blocks of 4 x float to process, i.e. your vectors of 16 x 8 bit pixels will become 4 x vectors of 4 x floats.

Summary of required intrinsics:

_mm_load_si128(...)       // load 16 x 8 bit values

_mm_unpacklo_epi8(...)    // unpack 8 bit -> 16 bit
_mm_unpackhi_epi8(...)

_mm_unpacklo_epi16(...)   // unpack 16 bit -> 32 bit
_mm_unpackhi_epi16(...)

_mm_cvtepi32_ps(...)      // convert 32 bit int -> float
Paul R
  • 208,748
  • 37
  • 389
  • 560
  • 1
    I thought it would be fun to write up a fixed-point answer to a new duplicate of this: http://stackoverflow.com/a/32288984/224132. I also made a unpack-to-FP and back version. It's trickier than you'd expect for unsigned pixels, because `packuswb` expects *signed* input. Using it on the output of `packusdw` means saturated 0xffff words are interpreted as -1, and get clamped to zero. I dealt with that by masking between the two pack steps, once `packusdw` has done the signed->unsigned conversion with saturation. The unpack is fine, with SSE4.1 `pmovzxbd` or SSSE3 `pshufb`. – Peter Cordes Aug 31 '15 at 00:37