3

For recolering the pixels in a bitmap image in my C# project I want to round my RGB values either up to a value of 255 or down to 0;

Each value is a byte.

Right now I am doing the following:

(byte)(Pixels[i] < 128 ? 0 : 255);

I am sure this could be achieved in a quicker way and without type casting using bitwise operations. How would I go about doing this?

Michael Frey
  • 908
  • 2
  • 14
  • 35
  • 4
    Well, there are only 256 of them. List all of them *in binary form* and then observe whether there is some feature that the ones less than 128 have that is different than ones greater than 127 have. Then use that feature. – Eric Lippert Jun 21 '14 at 14:46
  • Hmm, change to integer, shift right 7 bits so you have 0 or 1. Negate so you have 0 or -1. Convert back to byte, would that do it? Avoids the branching, but might actually be slower. – RenniePet Jun 21 '14 at 14:49
  • This is a legitimate question. The ternary statement constitutes a conditional branch, which hurts instruction-level parallelism in superscalar processors. – Douglas Jun 21 '14 at 14:50
  • @RenniePet or cast to `sbyte`, shift, cast back to `byte`, skips the negation step – harold Jun 21 '14 at 14:51
  • @Douglas there's too much cruft for it to be a legitimate, to-the-point question and it shows no research effort. All the pixel part is irrelevant, or the pixel-specific problem must be explained. Otherwise it's just a question about binary operations, which I'm sure has been answered before. – CodeCaster Jun 21 '14 at 14:53
  • 4
    You have changed your question to now ask for the *quicker* solution. Remember that performance questions cannot be answered **without trying it on your actual hardware**, remember that you need to optimize **the slowest thing** -- this might not be it -- and remember also that you don't want the faster solution, you want the *fast enough* solution. Set a goal, and check to see if you've met your goal. If you've met your goal, *go to the beach*, it's a beautiful day. Don't waste time making code faster that is already fast enough. – Eric Lippert Jun 21 '14 at 14:53
  • 256 Options -> Tried to use a lookup table? But might be not the best option, never tried it. – Felix K. Jun 21 '14 at 14:54
  • 2
    @EricLippert: Go to the beach? Is that supposed to be more obsessively compulsively satisfying than fiddling with bits? – RenniePet Jun 21 '14 at 14:57

3 Answers3

7
   (byte)(Pixels[i] < 128 ? 0 : 255)

Yes, this tends to perform poorly due to poor branch prediction if the bitmap contains too much random data. The jitter doesn't generate a conditional move for a statement like this.

You can use a trick, a right-shift preserves the sign bit for signed integral values. Which makes this code work:

   (byte)((sbyte)Pixels[i] >> 7)

Which generates code without a branch:

000000a7  movsx       eax,byte ptr [edx+eax+8]  ; Pixels[i], sign extended to 32-bits
000000ac  sar         eax,7                     ; >> operator
000000af  and         eax,0FFh                  ; (byte) cast
Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • 1
    You probably need `unchecked` for the final cast, no? – TheBuzzSaw Jun 21 '14 at 14:55
  • Are bytes some kind of exception to the rule? I just know I needed `unchecked` when casting from `Int32` to `UInt32` because I _wanted_ the negative value to just flip up to a positive one. – TheBuzzSaw Jun 21 '14 at 14:57
  • 1
    Sure, IL does not support operations on the smaller value types. The result of the expression is *int*, not *sbyte*. – Hans Passant Jun 21 '14 at 15:01
  • You can do this (probably) even faster for 4 values at once using a 32-bit int like this: `((pixel[i] >> 7) & 0x01010101) * 0x000000FF` and something similar on 8 values at once in 64-bit. You just shift the msb of each byte down and mask out the high order bits of each byte. The multiplication fans the now-lsb to all the other bit positions in the byte. – Apriori Jun 24 '14 at 19:03
5

A few possibilities:

  • (byte)((b << 24) >> 31);
  • (byte)((sbyte)b >> 31);
  • (uint)(int)(sbyte)b >> 24;

With the first two the trick is to map large numbers to negative values, then use a signed right shift to turn the result to either -1 or 0 and finally cast back to byte.

The last one is even nicer in theory, since it could be compiled to movsx eax,... shr eax, 24 without the masking in the end. But I doubt that the .NET JITter realizes that.

Should only be used in an unchecked context.

CodesInChaos
  • 106,488
  • 23
  • 218
  • 262
3

You might get the best performance by using a (pre-populated) lookup table.

var lookup = Enumerable.Range(0, 256).Select(i => i < 128 ? (byte)0 : 255).ToArray();

Since you only have 256 values, this can reside in L1 cache, meaning that its access is no more expensive than an arithmetic computation:

Pixels[i] = lookup[Pixels[i]];
Douglas
  • 53,759
  • 13
  • 140
  • 188
  • 1
    From my experience, the array bounds check slows code like this down below the performance of shift based solutions – CodesInChaos Jun 21 '14 at 15:00
  • @CodesInChaos: Good point; however, since the bounds check will always succeed, it is likely that its cost can be amortized by the branch predictor. – Douglas Jun 21 '14 at 15:02
  • My experience is that even though bounds checks always succeed and the branch predictor thus works well, they're still costly in practice. The JITter can sometimes optimize them out, but I'm pretty sure it won't do so in your example. – CodesInChaos Jun 21 '14 at 15:11