6

Intel provides several SIMD commands, which seems all performing bitwise XOR on 128-bit data:

_mm_xor_pd(__m128d, __m128d)
_mm_xor_ps(__m128, __m128)
_mm_xor_si128(__m128i, __m128i)

Isn't bitwise operations only operate on bit streams? Why there are three operations that have different type but same data size?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
jiandingzhe
  • 1,881
  • 15
  • 35

2 Answers2

4

_mm_xor_pd(__m128d, __m128d) operates on two 64 bit double precision floats

[https://msdn.microsoft.com/en-us/library/w87cdc33%28v=vs.90%29.aspx1

_mm_xor_ps(__m128d, __m128d) operates on four 32 bit single precision floats

https://msdn.microsoft.com/en-us/library/ss6k3wk8(v=vs.90).aspx

_mm_xor_si128(__m128d, __m128d) operates on one 128 bit value

https://msdn.microsoft.com/en-us/library/fzt08www%28v=vs.90%29.aspx

An XOR can be used between any two binary numbers regardless of their format. Why three? Because it's a balance to support common data types (float, double and 128 bits) and not have two many instructions.

The balance is the amount of silicon used, as each set of operations may occur in a separate functional units (integer, float, double). If they use different silicon all the different types of operation could execute in parallel.

Tim Child
  • 2,994
  • 1
  • 26
  • 25
  • 1
    On most CPUs, `pxor` (`_mm_xor_si128`) can execute on any vector ALU port. On some CPUs (like Intel before Skylake), the FP ones can only execute on one of the ports. So it's not like one execution port each. It's not to "support balance", it's because of connecting the FP ones to the FP bypass forwarding network, and the integer one to the integer network, for better latency when forwarding between instructions like `paddb` instead of `addps`. See [What is the point of SSE2 instructions such as orpd?](https://stackoverflow.com/q/62111946) and the other linked duplicates. – Peter Cordes Jan 23 '22 at 11:42
2

From a strict C point of view, they are all different because of the types.

They might also be hints for the CPUs about which kind of data you are intending to manage. At least this is the best interpretation the experts come with. As they said, this needs to be checked on hardware though.

AntoineL
  • 888
  • 4
  • 25