3

I saw this really interesting tweet:

resisting my code golf instinct to turn if(!bool1 && bool2) into if(bool1<bool2)

I had never seen that before, so I wanted to see if compilers would also use this optimization. I started a repo with a README and a test C program: https://github.com/ndbroadbent/gcc_experiments

Here is the test program:

#include <stdlib.h>
#include <stdio.h>
#include <stdbool.h>

int main(int argc, const char* argv[]) {
  if(argc != 3) {
    printf("Usage: %s <a> <b>\n", argv[0]);
    exit(1);
  }
  bool a = strtol(argv[1], NULL, 10) != 0;
  bool b = strtol(argv[2], NULL, 10) != 0;

  if (!a && b) {
    printf("!a && b == true (a: %d, b: %d)\n", a, b);
  } else {
    printf("!a && b == false (a: %d, b: %d)\n", a, b);
  }
}

I tried compiling this program with both the gnu90 and C99 standards. I know C99 has a bool type, but is that still treated like an integer, so the compiler can't make any optimizations based on boolean logic?

I might be reading the assembly wrong, but it looks like C99 with -O3 is also including jne and je instructions, instead of just using one "less than" operation and a single jump instruction. It looks like C++ doesn't make this optimization either.

ndbroadbent
  • 13,513
  • 3
  • 56
  • 85
  • 5
    "I tried compiling this program with both GCC and C99." - What do you mean? GCC is a compiler and C99 is a standard. – klutt Nov 21 '18 at 17:51
  • Sorry I just mean I used both `gcc` and `c99` on the command line. So the default for `gcc` is apparently `-std=gnu90`, and calling `/usr/bin/c99` sets `-std=c99`. I updated my question to refer to the standards instead of the command-line tools I was running. – ndbroadbent Nov 21 '18 at 17:58
  • What is the assembly output then? – KamilCuk Nov 21 '18 at 18:03
  • 4
    if `bool1` is true, you're comparing to another value when you could just bail out with a single test. I guess it depends on the stats of the outcome of the booleans. – Jean-François Fabre Nov 21 '18 at 18:03
  • @DavidSchwartz I tried all of the different optimization options (from -O0 to -O3). I have added the output from `objdump` to my GitHub repo: https://github.com/ndbroadbent/gcc_experiments – ndbroadbent Nov 21 '18 at 18:05
  • Your latest edit modifies the apparent question to be about details of the cost of specific assembly instructions. I didn't think that was where you were aiming, but if that's where you really want to go, then this question is a [dupe](https://stackoverflow.com/questions/692718/how-many-cpu-cycles-are-needed-for-each-assembly-instruction). – John Bollinger Nov 21 '18 at 18:20
  • @JohnBollinger You're right, that wasn't really where I wanted to go. I saw that other question and answers, but they didn't really answer my question. – ndbroadbent Nov 21 '18 at 18:51
  • @ndbroadbent, at this point, then, I think you're better off accepting an answer and then asking a new question about what you really want to know. As you're probably aware, changing the question out from under existing answers is not well received around here. – John Bollinger Nov 21 '18 at 18:55
  • @JohnBollinger Hmm, so far none of the answers really explain why this optimization isn't being used. I've just performed some benchmarks and `a < b` is usually a bit faster than `!a && b` for random data. – ndbroadbent Nov 21 '18 at 18:56
  • I think @n.m.'s answer is actually pretty clear on that: the optimization is not performed because the compiler thinks the version it does emit is better. Whether it turns out to be right about that is likely to be context and hardware dependent. – John Bollinger Nov 21 '18 at 18:59
  • Ok good point, I will accept that answer then. – ndbroadbent Nov 21 '18 at 19:00

1 Answers1

7

Compilers are well aware of the eqivalence and are able to optimise based on it. Their idea of what should be optimised to what might be opposite from yours though.

For sake of completeness, here's the clang-produced assembly output of a function that does !a && b and also of a function that does “a < b”. It's the same assembly output in both cases.

    mov     eax, edi
    not     al
    and     al, sil
    ret
n. m. could be an AI
  • 112,515
  • 14
  • 128
  • 243
  • Haha awesome, that was a great way to show that the compiler knows they are equivalent. Thanks! – ndbroadbent Nov 21 '18 at 18:40
  • I should also point out that GCC 8.2 doesn't make this optimization, but clang/llvm seems to be smarter and knows that they are equivalent. – ndbroadbent Nov 21 '18 at 19:03
  • 1
    Linking to an off-site resource without explaining it in the answer is not good for Stack Overflow. – Eric Postpischil Nov 21 '18 at 19:05
  • Really interesting that it still uses the `jne` and `je` instructions for my original code, even if I change it to `a < b`: https://godbolt.org/z/bUfgEd On my CPU that's actually a little bit slower than the single `cmp` instruction. – ndbroadbent Nov 21 '18 at 19:06
  • that answer could include the snippets directly as well. Else it's link only even if it's good – Jean-François Fabre Nov 21 '18 at 19:50