Manipulating bits in C. Is there a better way?

Question

I have a program that uses the following two functions 99.9999% of time:

unsigned int getBit(unsigned char *byte, unsigned int bitPosition)
{
    return (*byte & (1 << bitPosition)) >> bitPosition;
}


void setBit(unsigned char *byte, unsigned int bitPosition, unsigned int bitValue)
{
    *byte = (*byte | (1 << bitPosition)) ^ ((bitValue ^ 1) << bitPosition);
}

Can this be improved? The processing speed of the program mainly depends on the speed of these two functions.

UPDATE
I will do a benchmark for each provided answer bellow and write the timings I get. For the reference, the compiler used is gcc on Mac OS X platform:

Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn)

I compile without any specific arguments like: gcc -o program program.c
If you think I should set some optimizations, feel free to suggest.

The CPU is:
2,53 GHz Intel Core 2 Duo

While processing 21.5 MB of data with my originally provided functions it takes about:
Time: 13.565221
Time: 13.558416
Time: 13.566042
Time is in seconds (these are three tries).

-- UPDATE 2 --

I've used the -O3 optimization (gcc -O3 -o program program.c) option and now I'm getting these results:
Time: 6.168574
Time: 6.170481
Time: 6.167839

I'll redo the other benchmarks now...

"The processing speed of the program mainly depends on the speed of these two functions." - have you benchmarked that? — Mitch Wheat, Jun 02 '14 at 10:30
99.9999% doesn't sound like real data. Did you get real timing data? — user2357112, Jun 02 '14 at 10:31
@Mitch Wheat: Well I wrote that because there is basically nothing else in the program. It is just reading bits and setting bits from particular bytes stored in files. There is also a special requirement that it needs to be done one byte at a time. Therefor I was just wondering if there is a way to change these two functions to improve performance. — Ivan Kovacevic, Jun 02 '14 at 10:33
If these bits are being read from and written to files, I would be quite surprised if this is your bottleneck. Most of the time is probably spent waiting for I/O. — user2357112, Jun 02 '14 at 10:36
It was a bit of a bad wording from my side. Actually the file is first completely loaded in memory. But because of the specific algorithm, the processing needs to be done byte by byte. Another note, I'm aware that these functions might be needlessly complicated, that is also a reason why I asked. An advice to simplify them is more than welcome! — Ivan Kovacevic, Jun 02 '14 at 10:40
It would probably be a good idea to show the calling code, as there may be optimisations that can be made in the way that these functions are called (e.g. combine multiple function calls into one). — Paul R, Jun 02 '14 at 10:51
I wish I could do that unfortunately it is a proprietary code :-/ I hope that will not make people mad here. I'm now testing the provided answers and I will give exact benchmarks in the comments bellow each of them. I'm thankful for any insight that I receive here and also sorry I could not reveal the full code. — Ivan Kovacevic, Jun 02 '14 at 11:08
Depending on how code is called, it may be faster to not use pointer arguments. If `byte` is temporary, getting address of it might force the compiler to place it on stack instead of register. Also make functions local to compilation unit with `static`. It helps compiler to inline it. — user694733, Jun 02 '14 at 11:12
After update: You are doing this the wrong way. Enable compiler optimizations **first**, then optimize by hand if there is still need to do so. — user694733, Jun 02 '14 at 11:21
OK I would like to proceed in that way, however I'm quite inexperienced with C and the specific compiler. Could you suggest some optimizations I could set? I will also google in parallel. — Ivan Kovacevic, Jun 02 '14 at 11:23
I suggest that you figure out how to run this program under a profiler that will tell you *exactly* what routines are taking up the most time. Programs often spend large amounts of time in unexpected places! :-) — Bob Jarvis - Слава Україні, Jun 02 '14 at 11:26
@IvanKovacevic I am not expert on gcc or LLVM, but I assume that using any of the basic `-O2`, `-Os` or `-O3` options would be [a good starting point](https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html). — user694733, Jun 02 '14 at 11:28
If those functions are consuming 99.9whatever% of your time then your overall algorithm is improperly designed. — Hot Licks, Jun 02 '14 at 14:43
two good ways to speed up the code. 1) use register and restrict attributes. 2) make the code inline so as to eliminate the call/return processing — user3629249, Jun 03 '14 at 02:55

Bob Jarvis - Слава Україні · Accepted Answer · 2014-06-02T14:02:22.030

4

If you want to stick with functions, then for the first one:

unsigned int getBit(unsigned char *byte, unsigned int bitPosition)
  {
  return (*byte >> bitPosition) & 1;
  }

For the second one:

void setBit(unsigned char *byte, unsigned int bitPosition, unsigned int bitValue)
  {
  if(bitValue == 0)
    *byte &= ~(1 << bitPosition);
  else
    *byte |= (1 << bitPosition);
  }

However, I suspect that the function call/return overhead will swamp the actual bit-flipping. A good compiler might inline these function calls anyways, but you may get some improvement by defining these as macros:

#define getBit(b, p) ((*(b) >> (p)) & 1)

#define setBit(b, p, v) (*(b) = ((v) ? (*(b) | (1 << (p))) : (*(b) & (~(1 << (p))))))

@user694733 pointed out that branch prediction might be a problem and could cause a slowdown. As such it might be good to define separate setBit and clearBit functions:

void setBit(unsigned char *byte, unsigned int bitPosition)
  (
  *byte |= (1 << bitPosition);
  }

void clearBit(unsigned char *byte, unsigned int bitPosition)
  (
  *byte &= ~(1 << bitPosition);
  }

And their corresponding macro versions:

#define setBit(b, p) (*(b) |= (1 << (p)))

#define clearBit(b, p) (*(b) &= ~(1 << (p)))

The separate functions/macros would be useful if the calling code hard-codes the value passed for the bitValue argument in the original version.

Share and enjoy.

edited Jun 02 '14 at 14:02

answered Jun 02 '14 at 10:33

Bob Jarvis - Слава Україні

48,992
9
77
110

Your version of `getBit` is clean – Mohit Jain Jun 02 '14 at 10:47
I also agree, your getBit is clean and I've tested in on my data now. It produces a speed improvement. With the original functions the time needed to process the data is 13.558416 seconds. With your getBit it takes 13.105047. However your setBit is slower than the original function and with it, it takes 14.398621 seconds to process the data. I've tried several times to get an average result... – Ivan Kovacevic Jun 02 '14 at 11:04
@IvanKovacevic - I added a macro implementation which may help improve things by eliminating the function call overhead (which a good compiler/optimizer might be able to eliminate by inlining anyways, but I have no idea what compiler/OS you're on (not that knowing would help but it's my excuse-of-the-moment :-)). – Bob Jarvis - Слава Україні Jun 02 '14 at 11:08
@IvanKovacevic - the `if` comparision may be slowing things down. Given that the comparisons are the same I wouldn't really expect the `?:` version in the macro to be much better, but as I said the elimination of the function call overhead may buy something if the compiler wasn't already inlining these calls. – Bob Jarvis - Слава Україні Jun 02 '14 at 11:14
@IvanKovacevic `setBit` might be slower due to [branch prediction](http://stackoverflow.com/a/11227902/694733). – user694733 Jun 02 '14 at 11:14
@Bob Jarvis: I've tested your functions now with -O3 optimization. getBit is again faster, macro getBit is even marginally more faster. setBit is slower and the macro version throws me an error! I've tested it like this: unsigned char test = 't'; setBit(&test, 2, 0); – Ivan Kovacevic Jun 02 '14 at 11:46
@Bob Jarvis: And the compiler complains: Error: expression is not assignable setBit(&test, 2, 0); note: expanded from macro 'setBit' #define setBit(b, p, v) (v ? *b |= (1 << p) : *b &= ~(1 << p)) – Ivan Kovacevic Jun 02 '14 at 11:47
@IvanKovacevic - I've updated the macro definitions to parenthesize each and every usage of the macro parameters. – Bob Jarvis - Слава Україні Jun 02 '14 at 12:19
@Bob Jarvis: It is working now! However you are missing one parenthesis at the end of the setBit. The benchmark results for the setBit macro are faster than your setBit function but slower than my original function. I'm getting these times(three tries): 6.397554s, 6.473452s, 6.397889s. And as for the separated set and clear bit, unfortunately I can not use that since the values are not hard coded for the bitValue argument. – Ivan Kovacevic Jun 02 '14 at 12:44
1

@IvanKovacevic - added extra paren to first definition of setBit macro. As far as one definition of setBit being faster or slower on your hardware, I suggest using whatever turns out to be fastest. You can define a macro version of your initial function as `#define setBit(b, p, v) (*(b) = (*(b) | (1 << (p))) ^ (((v) ^ 1) << (p)))` (and hopefully I counted the parens right this time :-). Share and enjoy. – Bob Jarvis - Слава Україні Jun 02 '14 at 14:10
I've played around with macros a bit now and it seems most of these parenthesis are unnecessary. This works perfectly fine: #define setBit(b, p, v) *b = (*b | (1 << p)) ^ ((v ^ 1) << p) – Ivan Kovacevic Jun 02 '14 at 15:20
1

@IvanKovacevic - you're correct that sometimes the extra parentheses are not needed. However, because macros are expanded through text substitution it's possible for a macro which appears to be perfectly acceptable to expand to something the compiler doesn't like - for example, if you remove the parentheses around b in `*(b)` and then pass `&test` in for b in the `setBit` macro, it will expand to `*&test` which may not be acceptable to the compiler. Over the years I've learned that putting parentheses around macro arguments is easy insurance. Share and enjoy. – Bob Jarvis - Слава Україні Jun 02 '14 at 17:09

unwind · Answer 2 · 2014-06-02T11:49:26.650

2

How about:

bool getBit(unsigned char byte, unsigned int bitPosition)
{
    return (byte & (1 << bitPosition)) != 0;
}

No need to use a shift operator to "physically" shift the masked-out bit into position 0, just use a comparison operator and let the compiler deal with it. This should of course also be made inline if possible.

For the second one, it's complicated by the fact that it's basically "assignBit", i.e. it takes the new value of the indicated bit as a parameter. I'd try using the explicit branch:

unsigned char setBit(unsigned char byte, unsigned int bitPosition, bool value)
{
  const uint8_t mask = 1 << bitPosition;
  if(value)
    return byte | mask;
  return byte & ~mask;
}

edited Jun 02 '14 at 11:49

answered Jun 02 '14 at 10:32

unwind

391,730
64
469
606

Did you perhaps mean `return byte &= ~mask;`? – user694733 Jun 02 '14 at 10:44
@user694733 D'oh, yes of course. At least almost. :) Fixed, thanks. – unwind Jun 02 '14 at 10:48
Is the `const unsigned char` in the first example a typo or did you include it because of some const correctness reason? – Lundin Jun 02 '14 at 11:43
@Lundin Typo, removed it. Thanks. I don't believe in `const`-declaring scalar values in function prototypes. – unwind Jun 02 '14 at 11:49
@unwind: I've converted your functions to use pointers as to make the change in place(important only for setBit). Your getBit approach is faster than mine or the one from Bob Jarvis(Timings(three tries): 5.912803s, 5.912882s, 5.908337s). However your setBit is slower than my originally provided function(Timings: 6.451259s, 6.464742s, 6.448771s). – Ivan Kovacevic Jun 02 '14 at 13:02

Lundin · Answer 3 · 2014-06-02T13:04:53.923

Generally, these things are best left to the compiler's optimizer.

But why do you need functions for such trivial tasks? A C programmer should not get shocked when they encounter basic stuff like this:

x |= 1<<n;      // set bit
x &= ~(1<<n);   // clear bit
x ^= 1<<n;      // toggle bit
y = x & (1<<n); // read bit

There is no real reason to hide simple things like these behind functions. You won't make the code more readable, because you can always assume that the reader of your code knows C. It rather seems like pointless wrapper functions to hide away "scary" operators that the programmer isn't familiar with.

That being said, the introduction of the functions may cause a lot of overhead code. To turn your functions back into the core operations shown above, the optimizer would have to be quite good.

If you for some reason persists in using the functions, any attempt of manual optimization is going to be questionable practice. The use of inline, register and such keywords are likely superfluous. The compiler with optimizer enabled should be far more capable to make the decision when to inline and when to put things in registers than the programmer.

As usual, it doesn't make sense to manually optimize code, unless you know more about the given CPU than the person who wrote the compiler port for it. Most often this is not the case.

What you can harmlessly do as manual optimization, is to get rid of unsigned char (you shouldn't be using the native C types for this anyhow). Instead use the uint_fast8_t type from stdint.h. Using this type means: "I would like to have an uint8_t, but if the CPU prefers a larger type for alignment/performance reasons, it can use that instead".

EDIT

There are different ways to set a bit to either 1 or 0. For maximum readability, you would write this:

uint8_t val = either_1_or_0;
...

if(val == 1)
  byte |= 1<<n;
else
  byte &= ~(1<<n);

This does however include a branch. Let's assume we know that the branch is a known performance bottleneck on the given system, to justify the otherwise questionable practice of manual optimization. We could then set the bit to either 1 or 0 without a branch, in the following manner:

byte = (byte & ~(1<<n)) | (val<<n);

And this is where the code is turning a bit unreadable. Read the above as:

Take the byte and preserve everything in it, except for the bit we want to set to 1 or 0.
Clear this bit.
Then set it to either 1 or 0.

Note that the whole right side sub-expression is pointless if val is zero. So on a "generic system" this code is possibly slower than the readable version. So before writing code like this, we would have to know that our CPU is very good at bit-flipping and not-so-good at branch prediction.

Great advices! And you are correct, the functions are used primarily as an attempt to provide better code readability by an unexperienced C programmer(me). — Ivan Kovacevic, Jun 02 '14 at 12:22
However my motivation was to combine your "set bit", "clear bit" and "toggle bit" into one function that would work independently whether you need to set a particular bit to zero or one. Otherwise I need to incorporate checking which particular operation I need to use at every place I need to toggle one bit "on" or "off" — Ivan Kovacevic, Jun 02 '14 at 12:29
@IvanKovacevic It seems quite unlikely that your multi-purpose "set bit to 1 or 0" function is more efficient than "check bit, if 1 use |= otherwise use &= ~". However, there are several ways you can write such code without the branch. I edited my answer with some examples. — Lundin, Jun 02 '14 at 13:01
But then again, manual optimizations to remove the branch is most likely very questionable practice. You will need in-depth knowledge of the high-end Intel CPU, which is obtained by reading a lot of _very_ boring CPU manuals, which is a task I wouldn't sign up for. Done it for PowerPC once and it was horrible reading... I would imagine that the Intel manual is even more cumbersome. — Lundin, Jun 02 '14 at 13:03
Thanks for the edit! Your branch-less solution seems very similar to my original setBit function and I've also tested it now excessively(many repeated tries). Execution times are exactly the same as with my original setBit function. So either your way with bit-flipping or my with XORs produces the fastest results for setting bits from all the provided answers here(At least in my environment). — Ivan Kovacevic, Jun 02 '14 at 13:41

Mohit Jain · Answer 4 · 2014-06-02T11:34:35.743

You can benchmark with the following variations and keep the best of all solutions.

inline unsigned int getBit(unsigned char *byte, unsigned int bitPosition)
{
    const unsigned char mask = (unsigned char)(1U << bitPosition);
    return !!(*byte & mask);
}


inline void setBit(unsigned char *byte, unsigned int bitPosition, unsigned int bitValue)
{
    const unsigned char mask = (unsigned char)(1U << bitPosition);
    bitValue ? *byte |= mask : *byte &= ~mask;
}

If your algorithm expects only zero v/s non zero result from getBit, you can remove !! from return. (To return 0 or 1, I found the version of @BobJarvis really clean)

If your algorithm can pass the bit mask to be set or reset to setBit function, you won't need to calculate mask explicitly.

So depending on the code calling these functions, it may be possible to cut on time.

Your getBit function is faster than mine but slower than the one from unwind(Timings: 5.988453s, 5.979856s, 5.985067s. I've also tested with a bigger data sample, approximately 10 times bigger, and the difference between your getBit and unwind's getBit is around 1 second in execution time). The setBit function is slower than my original one(Timings: 6.520632s, 6.531595s, 6.526662s) — Ivan Kovacevic, Jun 02 '14 at 13:27

Manipulating bits in C. Is there a better way?

4 Answers4