8

I recently wrote a program that sorts an array. For it, I needed to write a comparison function, which I will pass into it. My comparison function should have returned 1(if x > y), -1(if x < y) or 0(if x = y). I wrote a regular function (Function 1) using conditional expressions, but I was advised to write differently (Function 2). Is it better to write like that? Will a Boolean condition always return 1 for the truth?(I mean if x=0 and y=0 will we always havе (x==y)==1 ?)

Function 1:

int Icmp(void* x, void* y)
{
    int a = *(int*)x;
    int b = *(int*)y;
    if (a > b)
        return 1;
    else if (a < b)
        return -1;
    else
        return 0;
}

Function 2:

int Icmp(void* x, void* y)
{
    return (*(int*)x > * (int*)y) - (*(int*)x < *(int*)y);
}
Skiv Hisink
  • 164
  • 2
  • 12
  • 5
    The nominal advantage of the second is that it avoids conditional jumps and may perform better. The notation used is ghastly; the variables should be localized as in the first and then used in the computation. – Jonathan Leffler Oct 15 '19 at 04:56
  • 4
    The second one is a common trick to force it branchless on stupid computers. Combine the readability of the first to get the best of both worlds. Edit: i.e. as Jonathan said above – Antti Haapala -- Слава Україні Oct 15 '19 at 04:57
  • 8
    Both techniques work. Measure the performance difference carefully. It’ll probably be hard to detect. Prefer clarity when the difference is marginal. – Jonathan Leffler Oct 15 '19 at 04:59
  • 1
    Ask the one who advised you why. Ask in which situations the stated advantages occur. Ask why those siutations are expected to be prominent for you. If you do not get satisfying answers for all those questions, then find another advisor. Consider measuring the stated advantages, if possible. Consider yourself how important the advantages are which you are yourself aware of, most importantly "I understand the code" and "I think all my fellows understand the code better". Then decide yourself. – Yunnosch Oct 15 '19 at 05:13
  • I don't like the second one because it assumes that a true comparison is equal to 1. I know it *shall* be true in C99, but not every compiler are C99 compliant. And if you really need performance, use assembly code! – Joël Hecht Oct 15 '19 at 05:30
  • What I don't like is passing the integer arguments as pointers. That is something you should only do if the function aims to alter the values. – Cheatah Oct 15 '19 at 05:35
  • 2
    @JoëlHecht if it is not 1 then it is not a C compiler. Why would you want to use a non-C compiler to compile C code?! – Antti Haapala -- Слава Україні Oct 15 '19 at 05:38
  • @Cheatah this is a comparison function used in qsort, bsearch. – Antti Haapala -- Слава Україні Oct 15 '19 at 05:39
  • I don't get why people do vote this to be closed with **too broad** when my answer is shorter than the question itself. – Antti Haapala -- Слава Україні Oct 15 '19 at 06:07
  • Note the example code on the reference for [`qsort`](https://en.cppreference.com/w/c/algorithm/qsort) - it covers exactly this. – Sander De Dycker Oct 15 '19 at 06:48
  • 1
    @JonathanLeffler If the compiler isn't able to optimize out the branches of 1) then I doubt it would be able to optimize the branches of 2) anyway. Boolean comparisons typically result in a branch, regardless of if statements. – Lundin Oct 15 '19 at 06:59
  • @JoëlHecht What has C99 to do with comparisons or integers? – Lundin Oct 15 '19 at 07:00
  • @Lundin : c99 introduces bool, which is the result of a comparison. Before c99 a comparison result was a simple int. – Joël Hecht Oct 15 '19 at 11:31
  • @JoëlHecht No, that's wrong. Perhaps you are mixing up C and C++? The result of comparisons in C is of type `int` still. – Lundin Oct 15 '19 at 11:49
  • @Lundin : damn you're right. I don't know from where this confusion comes to my mind... – Joël Hecht Oct 15 '19 at 12:57

4 Answers4

14

The preferred way to write the non-branching code would be to use a local variable for the operands:

int icmp(const void *x, const void *y)
{
    int a = *(const int *)x;
    int b = *(const int *)y;
    return (a > b) - (a < b);
}

The expression is a common idiom in comparison functions, and if written using variables instead of in-place pointer dereferences it is rather readable too.

The code relies upon the fact that the result of a comparison using >, < or even == is of type int and either 1 or 0. This is required by the C standard - any compiler that generates values like 42 or -1 is by definition not a C compiler.

It is easy to see that max. one of a > b or a < b can be true at a given time, and the result is either 1 - 0, 0 - 1 or 0 - 0.

As to why branchless code - while compilers might generate the exact same code for both functions, they often do not. For example latest GCC and ICC both seem to generate a branch for the first function on x86-64, but branchless code with conditional execution for the latter. And to anyone who says that branches do not matter, then I refer you to the highest-voted QA ever on Stack Overflow.

  • This is the correct way to do it. It is very readable, the compiled machine-code is identical and it avoids branches in favor of a little additional arithmetic. The branch predictor can cause a lot of performance-issues when sorting huge arrays which are in a high state of entropy. This avoids that issue. – 3ch0 Oct 15 '19 at 05:58
  • 5
    Just a note that the absence of an `if`-statement doesn't mean that no branch will be generated, nor that if there is an `if`-statement that there will be a branch. It indeed depends on whether the compiler can optimize them away. – G. Sliepen Oct 15 '19 at 06:15
  • @G.Sliepen actually ICC would be perfectly capable of optimizing them out but it came to a conclusion that the branch would not probably be taken because it was written using branches... – Antti Haapala -- Слава Україні Oct 21 '19 at 11:40
3

Is it better to write like that?

I'd say no.

For performance; either it doesn't matter (likely for modern compilers), or it shouldn't be a separate function (and should be built into the code used for sorting), or you shouldn't be sorting at all (e.g. data sorted at creation and not sorted after creation).

For readability (code maintenance, chance of seeing errors in the original version, risk of introducing errors later) I'd prefer your original version; especially when working in a team, and especially when other team members are more familiar with 10 other programming languages that each has very different rules to C.

Specifically; I like this (because casts in actual code make things harder to read):

    int a = *(int*)x;
    int b = *(int*)y;

..and I would rewrite the rest to look like this:

    if (a > b) {
        return 1;
    }
    if (a < b) {
        return -1;
    }
    return 0;
}

..or to look like this:

    if (a > b) return 1;
    if (a < b) return -1;
    return 0;
}

..because else is unnecessary after a return; and because "if without braces followed by statement on its own line" creates a risk of someone accidentally inserting a new line without realizing it and breaking everything (for an example, see https://dwheeler.com/essays/apple-goto-fail.html ).

Brendan
  • 35,656
  • 2
  • 39
  • 66
0

If you are using the comparison function with qsort, the function only needs to return +ve, -ve or zero values.

In that case, you can just subtract the numbers

int Icmp(const void* x, const void* y)
{
    return (*(int*)x - *(int*)y);
}

Rishikesh Raje
  • 8,556
  • 2
  • 16
  • 31
  • This is indeed more efficient than `(a > b) - (a < b)`, but the problem with `a - b` is that the subtraction may underflow. This is thus a bad expression. – chmike Jan 09 '20 at 11:10
-1

Integers

echo 1 <=> 1; // 0
echo 1 <=> 2; // -1
echo 2 <=> 1; // 1

Floats

echo 1.5 <=> 1.5; // 0
echo 1.5 <=> 2.5; // -1
echo 2.5 <=> 1.5; // 1

Strings

echo "a" <=> "a"; // 0
echo "a" <=> "b"; // -1
echo "b" <=> "a"; // 1
Milan Maniya
  • 101
  • 2
  • The question is about how to compare integers in C. Your answer is clearly not valid C code, and thus doesn't provide useful information to the person that asked the question. Please make sure your answers are on-topic. – G. Sliepen Oct 21 '19 at 11:50