What is the best way to perform branching using Intel SSE?

Question

I'm writing a compiler and I have to output code for branching conditions on float values. For example, to compile this kind of code:

if(a <= b){
    //1. DO something
} else {
    //2. Do something else
}

When a and b are float variables. I just need to jump to 2 if the condition is not true, else fall to 1. I'm considering here optimization at the compiler level considering what's in 1 and 2.

I need something that works with all the comparison operators >, >=, <, <=, == and !=

A way I found to make the comparison is to use CMPLTSD (and other equivalent instructions for other relational operators). But with that, I have to use a SSE register especially for the result and then I have to move its value on a general purpose register (eax for example) and finally compare the value with 0.

I also saw that the UCOMISD instruction should set the flags correctly, but apparently it doesn't work the way I thought.

So, what's the best way to handle code like that? Is there better instructions than the first solution I have?

By best, I mean, the general solution to this problem. If possible, I would like to have code behave the same way as when doing comparisons on integers (cmp a, b; jge label). Of course, I would prefer the fastest instructions to achieve that.

The best way to do it *depends on what you are doing*. As in, what is inside the `//DO something` block? "The best way" often depends on looking at the whole picture, not trying to translate your code line by line. — jalf, Mar 04 '12 at 19:48
If you actually want to branch, UCOMISD (which is actually SSE2) does appear to be the answer, what's the problem with it? The Unordered result? — harold, Mar 04 '12 at 20:02
The problem with UCOMISD is that I don't know how to jump according to the result of the comparison. I tried jumping with jle, but I didn't get the expected result. Do I have to use special conditional jump instructions ? — Baptiste Wicht, Mar 04 '12 at 20:19

score 7 · Accepted Answer · answered Mar 04 '12 at 20:25

The condition codes for ucomisd do not correspond to signed integer comparison codes, but to unsigned ones (with "unordered" in the parity flag). It's a bit strange, I admit, but all clearly documented. The code if you actually want to branch could be something like this for <=:

  ucomisd a,b
  ja else     ; greater
  jp else     ; unordered
  ; code for //1 goes here
  jmp end
else:
  ; code for //2 goes here
end:

For <:

jae else   ; greater or equal
jp else    ; unordered

I could list them all if you really want but you can just look at the condition codes for ucomisd and match them to what jump you need.

This is indeed strange... But I think I will got it with the documentation. Thanks a lot. — Baptiste Wicht, Mar 04 '12 at 20:28

score 2 · Answer 2 · edited May 23 '17 at 12:17

Important: @harold's answer is almost exactly right but has a subtle wrong aspect which may drive you crazy in a very important edge case later on -- the NaN treatment is backwards from most languages (like c++).

As @harold says correctly, the unordered compare result is stored in the parity flag.

However, unordered compare is true when any operand is NaN as detailed in this stack overflow post. That means NaN will be less than, equal to and greater than absolutely every number including NaN.

So if you want your language to match c++'s behavior of where any comparison with NaN returns false, you want:

For <=:

ucomisd xmm0, xmm1
jbe else_label

For <:

ucomisd xmm0, xmm1
jb else_label

Confirmed in the following gcc disassembly, where I return a >= b:

144e:       66 0f 2e c8             ucomisd %xmm0,%xmm1
1452:       0f 93 c0                setae  %al

Here it uses setae which is the register-modifying equivalent to jae. It then immediately returns without inspecting the parity flag.

For why its ja and not jg, @harold's answer is still a clear and correct explanation.

And of course, you don't have to use ordered compare, you can use unordered compare as shown in the previous answer if you want absolutely every number to be less than, greater than, and equal to NaN in your program/language (where even NaN < NaN is true!). And of course, as you can see, it may be a little bit slower since it requires additional checks.

What is the best way to perform branching using Intel SSE?

2 Answers2