0

I was recently looking at an assembly code which used psadbw and pshufd on xmm registers which are 128-bit register.

As per the documentation here on x86 architecture. It shows pshufd xmm0, xmm0, imm8 But I fail to understand how we can use it practically under what cases.

Similarly, psadbw computes sum of absoulte differences of bytes. So, does it means like for 128bits register (16 bytes) . So let's say my instruction is psadbw xmm0 xmm5 does this do like |1st byte of xmm5 - 1st byte of xmm0| + |2nd byte of xmm5 - 2nd byte of xmm0| + ... + | 16th byte of xmm5 - 16th byte of xmm0| ? If not, how does that works.

Can anyone provide practical scenario ?

user8877134
  • 87
  • 2
  • 9
  • 2
    `psadbw` has two groups of 8. Don't you already have a practical scenario, since you encountered them in actual code – harold Nov 02 '17 at 21:07
  • @harold - Just confirming, so does two groups work like 1st group = `|1st byte of xmm5 - 1st byte to xmm0 | + ....+ |8th byte of xmm5 - 8th byte of xmm0|` 2nd group = `|9th byte of xmm5 - 9th byte of xmm0 | + ....+ |16th byte of xmm5 - 16th byte of xmm0|` and then xmm0 results to `1st group (2 bytes) + 0 ( 6 bytes) + 2nd group(2 bytes)+ 0 (6 bytes) ` ??? – user8877134 Nov 02 '17 at 21:25
  • Yes (I did not read too accurately), of course you could have known by reading the relevant page on that website you were using: [psadbw](http://x86.renejeschke.de/html/file_module_x86_id_253.html) – harold Nov 02 '17 at 21:28
  • @harold - Thanks, I still do not understand `pshufd` , how can one represent in a way I represent the `psadbw` ... and what does `encoding in imm8` means in the [manual](http://x86.renejeschke.de/html/file_module_x86_id_254.html) – user8877134 Nov 02 '17 at 21:31
  • [This has a simple C equivalent for the shuffle that `pshufd` does](https://stackoverflow.com/questions/37084379/convert-mm-shuffle-epi32-to-c-expression-for-the-permutation). See https://stackoverflow.com/questions/6996764/fastest-way-to-do-horizontal-float-vector-sum-on-x86 for a use-case in integer horizontal sums (e.g. at the end of a loop). `psadbw` is useful as a horizontal sum of a single register if you take the difference against 0. (Other than the obvious use-case in image/video processing when computing SAD on 8x8 blocks. e.g. in a motion search) – Peter Cordes Nov 02 '17 at 21:58

1 Answers1

2

Here is one practical scenario. The psadbw instruction maps to the _mm_sad_epu8 SSE2 intrinsic, which can be found in the Chromium's fork of zlib (code).

Another use is in motion estimation: https://wiki.mozilla.org/SIMD/Uses/SAD

Egor Pasko
  • 506
  • 3
  • 11