1

I was reading somewhere that it is faster to use bitwise operators instead of if statements where possible. I am working on an image processing project and I have various methods for doing math on pixels. For instance when I add a pixel I would check and make sure the sum does not go over the maximum value. I changed it to be this...

    Pixel16 operator+(Pixel16 p) const noexcept
    {
        uint_fast32_t r = red + p.red;
        uint_fast32_t g = green + p.green;
        uint_fast32_t b = blue + p.blue;
        return Pixel16(r | -(r > 0xffff), g | -(g > 0xffff), b | -(b > 0xffff));
    }

Do you guys think it is faster than writing statements like...

if(r > 0xffff)
 r = 0xffff;

FYI red, green, and blue are member variables of type uint16_t

LogicStuff
  • 19,397
  • 6
  • 54
  • 74
chasep255
  • 11,745
  • 8
  • 58
  • 115
  • 1
    Who knows, try it, obviously it depends on what the compiler does with it and the characteristics of the platform and (when applicable) the predictability of that branch. If you're willing to accept SSE intrinsics, there's an intrinsic for "packed add words with unsigned saturation" – harold Apr 23 '15 at 15:33
  • 6
    Advice: first, write for maintainability, then if it's not fast enough for your need, after measuring which part is the bottleneck, try to optimise and **measure** the actual speedup. – Hiura Apr 23 '15 at 15:33
  • Why not run a release version of your code and test which one is faster? Also, on a side note -- you should implement `operator +=` first, and then implement `operator+` in terms of `operator +=`. Makes a much cleaner interface, plus you get two overloaded operators for practically the price of one. – PaulMcKenzie Apr 23 '15 at 15:34
  • Wouldnt it take more cycles to execute the bitwise operations than it would be to do the if statement? – Javia1492 Apr 23 '15 at 15:35
  • 1
    @Javia1492 how do you know what the if-statement compiles to? It could even be exactly the same as the bitwise version.. or a conditional move, or something else that isn't a branch, or it could be a branch but then you still don't know how long that branch takes because it depends on that platform and the predictability of the branch – harold Apr 23 '15 at 15:37
  • 1
    @Javia1492 [See this](http://stackoverflow.com/a/11227902/16287). – Drew Dormann Apr 23 '15 at 15:37
  • @DrewDormann Thanks for the link. Very good read and information. – Javia1492 Apr 23 '15 at 15:42
  • just measure in real task. result may differ depending on compiler and platform. real measurement can easily differ with words of any expert. – Vincenso Apr 23 '15 at 15:37

4 Answers4

2

Given this code:

#include <algorithm>
#include <cstdint>

struct Pixel16 {
    uint16_t red;
    uint16_t blue;
    uint16_t green;

    Pixel16(uint16_t red, uint16_t green, uint16_t blue);

};

Pixel16 v1(Pixel16 const & p, Pixel16 const & s) {
    uint_fast32_t r = p.red   + s.red;
    uint_fast32_t g = p.green + s.green;
    uint_fast32_t b = p.blue  + s.blue;
    return Pixel16(r | -(r > 0xffff), g | -(g > 0xffff), b | -(b > 0xffff));
}

Pixel16 v2(Pixel16 const & p, Pixel16 const & s) {
    uint_fast32_t r = p.red   + s.red;
    uint_fast32_t g = p.green + s.green;
    uint_fast32_t b = p.blue  + s.blue;

    r = std::min(r, (uint_fast32_t) 0xFFFF);
    g = std::min(g, (uint_fast32_t) 0xFFFF);
    b = std::min(b, (uint_fast32_t) 0xFFFF);

    return Pixel16(r, g, b);
}

What does my compiler give as a result?

Clang on OS X will generate functionally identical code for v1 and v2. The only difference is that the order that the calls to min() and the equivalent work in v1 occur in a different order.

In both cases, there are no branches.

Summary:

Write the code that is most understandable. Use functions and language features to express your code in a readable manner. Is your bitwise code more or less readable than a min() function?

Bill Lynch
  • 80,138
  • 16
  • 128
  • 173
1

In almost all cases "faster" will depend greatly on the target system and the compiler used, as well as how operators may be overloaded. In this case, though, I seriously doubt there will be that much of a difference, since you're still using comparison operators, which should be the more expensive operation than a simple if branch.

The code listed above, though, bothers me. Negating the result of a comparison operation (a boolean operation, unless you've overridden the operators) is not something that's safe to bitwise-or like you're doing. Besides, it's very difficult to understand, which means it'll be very difficult for someone to maintain later on. The if version, though, explains what's going on.

Joe Sewell
  • 306
  • 1
  • 11
  • *"doubt there will be that much of a difference, since you're still using comparison operators, which should be the more expensive operation than a simple if branch."* - comparisons are easily handled as subtractions followed by a test of the sign and/or zero bit in the CPU... if branches actually involve PC changes and are much slower. *"not something that's safe to bitwise-or"* - the conversion of a the `bool` comparison result is well defined - `false` undergoes conversion to 0, `true` to 1 - and entirely safe. Arguing it's unclear is reasonable but subjective, unsafe is just wrong. – Tony Delroy Apr 23 '15 at 15:44
  • @TonyD As I said, it's going to depend on the compiler and the target platform. In many cases, comparison operators generate conditional branches in code anyhow. That's the "test of the sign and/or zero bit in the CPU" you mentioned. Again, though, it depends on the CPU; I'm not assuming Intel here. As for the `bool` results, all it takes is overloading the `>` operator to ruin that assumption. – Joe Sewell Apr 23 '15 at 15:48
  • that misses the fact that the compiler can use those bits without branching on them, which is highly desirable because setting the bits is typically cheap and branching is - relatively - not. I'm not talking any particular CPU either... I've coded x86, 6502C, UltraSparc, 68000s, Z80 etc... And you can't overload `operator>` for inbuilt types like those in the question.... – Tony Delroy Apr 23 '15 at 15:52
1

To know, you have to measure (profile the code).

The theory as to why the bitwise operations might be faster is that branch prediction failures can be a significant performance hit. By using bitwise operations instead of control flow switches, you may be able to avoid a conditional branch altogether and thus you'll never have a branch prediction miss there.

Depending on the situation, it might also be slightly cache-friendlier to compute rather than branch.

But you won't know the actual effects unless you measure. Too many factors are in play to simply analyze the code.

Adrian McCarthy
  • 45,555
  • 16
  • 123
  • 175
0

This is what used to be called a microoptimisation thirty years ago, because it could save microseconds. Today I'd call it a nanooptimisation :-)

Unless you have reason to believe that the speed is affecting anyone, do what is more readable.

If speed is important, for example because you need to process 60 10 megapixel images per second, then you try out various methods and measure the execution time. The exact speed will depend on your compiler, on the processor, sometimes on pure good luck or bad luck. Obviously you use the same compiler optimisations settings that you use in production code.

And which code produces the fastest results may be different on different processors. So you leave all variants in your source code (including the most readable one), and use #ifdef to pick the known fastest one where you have measured it. Say your code is supposed to run on three current and five future devices. And you measured on two of the current devices. Then you make sure that on these two devices the fastest measured code is run. On the other devices it's a judgement call. If A was 40% faster than B on processor X, but 5% slower on processor Y, then you'd probably use A on all processors where the speed wasn't measured. And you add the measurement results as comments.

Now all that is a lot of work, so you only do it if the time saved is worth that effort.

gnasher729
  • 51,477
  • 5
  • 75
  • 98