Bitfield manipulation in C

Question

The classic problem of testing and setting individual bits in an integer in C is perhaps one the most common intermediate-level programming skills. You set and test with simple bitmasks such as

unsigned int mask = 1<<11;

if (value & mask) {....} // Test for the bit
value |= mask;    // set the bit
value &= ~mask;   // clear the bit

An interesting blog post argues that this is error prone, difficult to maintain, and poor practice. The C language itself provides bit level access which is typesafe and portable:

typedef unsigned int boolean_t;
#define FALSE 0
#define TRUE !FALSE
typedef union {
        struct {
                boolean_t user:1;
                boolean_t zero:1;
                boolean_t force:1;
                int :28;                /* unused */
                boolean_t compat:1;     /* bit 31 */
        };
        int raw;
} flags_t;

int
create_object(flags_t flags)
{
        boolean_t is_compat = flags.compat;

        if (is_compat)
                flags.force = FALSE;

        if (flags.force) {
                [...]
        }
        [...]
}

But this makes me cringe.

The interesting argument my coworker and I had about this is still unresolved. Both styles work, and I maintain the classic bitmask method is easy, safe, and clear. My coworker agrees it's common and easy, but the bitfield union method is worth the extra few lines to make it portable and safer.

Is there any more arguments for either side? In particular is there some possible failure, perhaps with endianness, that the bitmask method may miss but where the structure method is safe?

How does `flags.compat` and `flags.force` work? Shouldn't they be `flags.{structure_name}.compat` and `flags.{structure_name}.force`? — Lazer, Mar 19 '10 at 18:40
@Lazer you can refer to the members of unnamed struct/union directly in gcc GNU extension, and it is now in C11 standard. — syockit, Sep 29 '16 at 05:30
The use of bit fields in this manner is highly non-portable. See https://en.cppreference.com/w/c/language/bit_field Notes section: alignment and ordering of bit fields is unspecified. — natersoz, Apr 21 '20 at 14:00
*The C language itself provides bit level access which is typesafe and portable* Typesafe? Yes. Portable? [***NOT AT ALL***](http://port70.net/~nsz/c/c11/n1570.html#6.7.2.1p11): "An implementation may allocate **any addressable storage unit large enough to hold a bit-field**. ... **whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined**. The **order of allocation** of bit-fields within a unit ... **is implementation-defined**. The **alignment** of the addressable storage unit **is unspecified**." — Andrew Henle, Sep 13 '22 at 10:12

score 46 · Accepted Answer · answered Jun 25 '09 at 15:48

46

Bitfields are not quite as portable as you think, as "C gives no guarantee of the ordering of fields within machine words" (The C book)

Ignoring that, used correctly, either method is safe. Both methods also allow symbolic access to integral variables. You can argue that the bitfield method is easier to write, but it also means more code to review.

answered Jun 25 '09 at 15:48

Matthew Flaschen

278,309
50
514
539

21

I've had issues with porting code to a compiler where the stupid bit field order was backwards. Very annoying. I'll stick with masks, thanks. :) – darron Jun 25 '09 at 17:01
@Matthew Flaschen: How does `flags.compat` and `flags.force` work? Shouldn't they be `flags.{structure_name}.compat` and `flags.{structure_name}.force`? – Lazer Mar 19 '10 at 18:41
@eSKay, you are correct. The C standard does not support unnamed struct fields (which are being used in the question), but gcc (http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Unnamed-Fields.html) and Visual C++ (http://msdn.microsoft.com/en-us/library/z2cx9y4f.aspx) do as extensions – Matthew Flaschen Mar 20 '10 at 04:31
1

There is no 'backwards' since forwards is not defined. Stay away. I have regretted ever using a bit-field as a means for extracting memory from a byte array of data. – natersoz Apr 21 '20 at 14:02

score 31 · Answer 2 · answered Jun 25 '09 at 17:35

31

If the issue is that setting and clearing bits is error prone, then the right thing to do is to write functions or macros to make sure you do it right.

// off the top of my head
#define SET_BIT(val, bitIndex) val |= (1 << bitIndex)
#define CLEAR_BIT(val, bitIndex) val &= ~(1 << bitIndex)
#define TOGGLE_BIT(val, bitIndex) val ^= (1 << bitIndex)
#define BIT_IS_SET(val, bitIndex) (val & (1 << bitIndex))

Which makes your code readable if you don't mind that val has to be an lvalue except for BIT_IS_SET. If that doesn't make you happy, then you take out assignment, parenthesize it and use it as val = SET_BIT(val, someIndex); which will be equivalent.

Really, the answer is to consider decoupling the what you want from how you want to do it.

answered Jun 25 '09 at 17:35

plinth

48,267
11
78
120

Actually, I've gone one step further here, and implemented a class called Flags that provides member functions for manipulation of bit flags. My base class uses atomic operations (handy when setting and testing in different threads), and I've even derived from it in a few cases to store flag state in the Windows registry automatically when bits are changed. And being a class, means I can easily derive from it to simply add status flag support to any object that needs such functionality. – Aug 09 '12 at 14:06
2

I would advise to parenthesize at least the `bitIndex` macro argument in the macro expansion: `#define SET_BIT(val, bitIndex) val |= (1 << (bitIndex))` to avoid potential precedence issues if the argument is an expression. – chqrlie Mar 25 '18 at 15:59
Yes, but that is only for [single bits](https://stackoverflow.com/questions/47981/) (one bit at a time). – Peter Mortensen Sep 13 '22 at 10:02

score 24 · Answer 3 · answered Jun 27 '09 at 21:29

24

Bitfields are great and easy to read, but unfortunately the C language does not specify the layout of bitfields in memory, which means they are essentially useless for dealing with packed data in on-disk formats or binary wire protocols. If you ask me, this decision was a design error in C—Ritchie could have picked an order and stuck with it.

answered Jun 27 '09 at 21:29

Norman Ramsey

198,648
61
360
533

I have a question for you: can I read byte file with a specific structure (such as the one storing TCP/IP packets) straightly into a data structure containing bit-fields (so I would have defined my own "struct Packet {..}"), rather than first reading bytes into char array and then processing them one by one? As you mentioned that "C does not specify the bitfield layout in memory", does it mean that wrong bits can get assigned to wrong bitfields of my data structure, if I do: Packet* p = malloc(sizeof(Packet)); fread(fp, 1, sizeof(Packet), p) - is it possible to do something like that? – mercury0114 Jan 15 '16 at 14:08
@mercury0114: doing that would not be reliable for code that hopes to be portable across various compilers and platforms. Hence the last sentence of the answer above. – lindes Apr 14 '19 at 02:14
Late comment for late readers: Such structs (@mercury0114) to be sent over TCP_IP can bring trouble even without bit-fields, as different compilers might apply different alignment for the members, so they can be a portability issue anyway! – Aconcagua Jun 06 '19 at 14:33

Doug T. · Answer 4 · 2009-06-25T17:12:39.380

You have to think about this from the perspective of a writer -- know your audience. So there are a couple of "audiences" to consider.

First there's the classic C programmer, who have bitmasked their whole lives and could do it in their sleep.

Second there's the newb, who has no idea what all this |, & stuff is. They were programming php at their last job and now they work for you. (I say this as a newb who does php)

If you write to satisfy the first audience (that is bitmask-all-day-long), you'll make them very happy, and they'll be able to maintain the code blindfolded. However, the newb will likely need to overcome a large learning curve before they are able to maintain your code. They will need to learn about binary operators, how you use these operations to set/clear bits, etc. You're almost certainly going to have bugs introduced by the newb as he/she all the tricks required to get this to work.

On the other hand, if you write to satisfy the second audience, the newbs will have an easier time maintaining the code. They'll have an easier time groking

 flags.force = 0;

than

 flags &= 0xFFFFFFFE;

and the first audience will just get grumpy, but its hard to imagine they wouldn't be able to grok and maintain the new syntax. It's just much harder to screw up. There won't be new bugs, because the newb will more easily maintain the code. You'll just get lectures about how "back in my day you needed a steady hand and a magnetized needle to set bits... we didn't even HAVE bitmasks!" (thanks XKCD).

So I would strongly recommend using the fields over the bitmasks to newb-safe your code.

"It's just much harder to screw up." Bitfields, in a nutshell. — Roddy, Jun 25 '09 at 16:01
That |= sets a whole lot of flag bits. flags **&=** 0xFFFFFFFE; maybe? you're actually helping the bit field position. And, based on his definition, it would be 0xFFFFFFFB on systems where the top bitfield is bit 0. — darron, Jun 25 '09 at 17:00
yes, your are correct dblack. But I guess that proves the overall point :) — Doug T., Jun 25 '09 at 17:13
to clear bitmask N, I'd use flags &= ~N; This has the advantage of not being tied to the size of flags. Run your code with 64-bit ints and your are slightly scr*wed... — Roddy, Jun 25 '09 at 19:21
I really wouldn't consider bit masking that hard. 30 minutes (or less) worth of reading on Wikipedia and you've got enough to grok what's going on. — Bob Somers, Jun 26 '09 at 07:46

score 14 · Answer 5 · answered Jun 27 '09 at 17:39

The union usage has undefined behavior according to the ANSI C standard, and thus, should not be used (or at least not be considered portable).

From the ISO/IEC 9899:1999 (C99) standard:

Annex J - Portability Issues:

1 The following are unspecified:

— The value of padding bytes when storing values in structures or unions (6.2.6.1).

— The value of a union member other than the last one stored into (6.2.6.1).

6.2.6.1 - Language Concepts - Representation of Types - General:

6 When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.[42]) The value of a structure or union object is never a trap representation, even though the value of a member of the structure or union object may be a trap representation.

7 When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.

So, if you want to keep the bitfield ↔ integer correspondence, and to keep portability, I strongly suggest you to use the bitmasking method, that contrary to the linked blog post, it is not poor practice.

Annex J is non-normative. Also , Annex J says the value is unspecified, not that there is undefined behaviour as you claim. (Which is a moot point since it's non-normative). The 6.2.6.1/7 refers to bytes of the union object which do not lie within the subobject being written. It was clarified via a defect report that writing one union member and reading another behaves like type punning; the clarified wording appears in the C11 standard (which cancels and replaces older standards) — M.M, Oct 31 '18 at 01:12

Roddy · Answer 6 · 2009-06-25T16:00:09.307

What it is about the bitfield approach that makes you cringe?

Both techniques have their place, and the only decision I have is which one to use:

For simple "one-off" bit fiddling, I use the bitwise operators directly.

For anything more complex - eg hardware register maps, the bitfield approach wins hands down.

Bitfields are more succinct to use (at the expense of /slightly/ more verbosity to write.
Bitfields are more robust (what size is "int", anyway)
Bitfields are usually just as fast as bitwise operators.
Bitfields are very powerful when you have a mix of single and multiple bit fields, and extracting the multiple-bit field involves loads of manual shifts.
Bitfields are effectively self-documenting. By defining the structure and therefore naming the elements, I know what it's meant to do.
Bitfields also seamlessly handle structures bigger than a single int.
With bitwise operators, typical (bad) practice is a slew of #defines for the bit masks.
The only caveat with bitfields is to make sure the compiler has really packed the object into the size you wanted. I can't remember if this is define by the standard, so a assert(sizeof(myStruct) == N) is a useful check.

I have found as I move from one embedded platform to another that it often requires writing sample code and checking the generated assembly to confirm the mapping from the bitfield in a struct to the actual machine word. Too many compilers organize their documentation well enough to make this easy to look up. I've also had problems where the hardware requires that memory accesses be guaranteed to be a particular size, and I had to find a barely documented switch to turn off the optimization that broke that. — RBerteig, Jun 26 '09 at 07:35
The other downside you haven't mentioned is that you have to check whether the bits are ordered from most significant bit to least significant bit, or vice versa. — Jason S, Feb 08 '12 at 16:08

qrdl · Answer 7 · 2009-06-26T07:27:43.413

6

The blog post you are referring to mentions raw union field as alternative access method for bitfields.

The purposes blog post author used raw for are ok, however if you plan to use it for anything else (e.g. serialisation of bit fields, setting/checking individual bits), disaster is just waiting for you around the corner. The ordering of bits in memory is architecture dependent and memory padding rules vary from compiler to compiler (see wikipedia), so exact position of each bitfield may differs, in other words you never can be sure which bit of raw each bitfield corresponds to.

However if you don't plan to mix it you better take raw out and you will be safe.

edited Jun 26 '09 at 07:27

answered Jun 25 '09 at 15:49

qrdl

34,062
14
56
86

Endianness should not affect it providing you use concrete representation for masks. This way, both are affected equally by endianness meaning that high/low bits correlate. – Aiden Bell Jun 25 '09 at 16:13
2

@Aiden, I think his point is that the mapping between the raw field and the members of the (anonymous) struct is HIGHLY platform dependent. I've been burned badly in embedded projects by this in the past, just trying to write a struct that matched a datasheet description of a register. The fact that the #$@(&* manufacturer numbered the bits such that bit 0 was the HIGH ORDER BIT didn't help at all, of course! – RBerteig Jun 26 '09 at 07:42
1

@RBerteig Exactly it was my point! In initial edition of my post I also mentioned endianness but because it is not guaranteed even on architectures with same endianness I took it out. – qrdl Jun 26 '09 at 08:03

score 6 · Answer 8 · edited Jun 27 '09 at 21:37

Well you can't go wrong with structure mapping since both fields are accessable they can be used interchangably.

One benefit for bit fields is that you can easily aggregate options:

mask = USER|FORCE|ZERO|COMPAT;

vs

flags.user = true;
flags.force = true;
flags.zero = true;
flags.compat = true;

In some environments such as dealing with protocol options it can get quite old having to individually set options or use multiple parameters to ferry intermediate states to effect a final outcome.

But sometimes setting flag.blah and having the list popup in your IDE is great especially if your like me and can't remember the name of the flag you want to set without constantly referencing the list.

I personally will sometimes shy away from declaring boolean types because at some point I'll end up with the mistaken impression that the field I just toggled was not dependent (Think multi-thread concurrency) on the r/w status of other "seemingly" unrelated fields which happen to share the same 32-bit word.

My vote is that it depends on the context of the situation and in some cases both approaches may work out great.

Aiden Bell · Answer 9 · 2009-06-25T16:14:51.043

Either way, bitfields have been used in GNU software for decades and it hasn't done them any harm. I like them as parameters to functions.

I would argue that bitfields are conventional as opposed to structs. Everyone knows how to AND the values to set various options off and the compiler boils this down to very efficient bitwise operations on the CPU.

Providing you use the masks and tests in the correct way, the abstractions the compiler provide should make it robust, simple, readable and clean.

When I need a set of on/off switches, Im going to continue using them in C.

score 5 · Answer 10 · answered Jun 26 '09 at 01:01

5

In C++, just use std::bitset<N>.

answered Jun 26 '09 at 01:01

Steve Jessop

273,490
39
460
699

Daniel Daranas · Answer 11 · 2014-12-11T13:01:59.993

5

It is error-prone, yes. I've seen lots of errors in this kind of code, mainly because some people feel that they should mess with it and the business logic in a totally disorganized way, creating maintenance nightmares. They think "real" programmers can write value |= mask; , value &= ~mask; or even worse things at any place, and that's just ok. Even better if there's some increment operator around, a couple of memcpy's, pointer casts and whatever obscure and error-prone syntax happens to come to their mind at that time. Of course there's no need to be consistent and you can flip bits in two or three different ways, distributed randomly.

My advice would be:

Encapsulate this ---- in a class, with methods such as SetBit(...) and ClearBit(...). (If you don't have classes in C, in a module.) While you're at it, you can document all their behaviour.
Unit test that class or module.

edited Dec 11 '14 at 13:01

answered Jun 26 '09 at 07:37

Daniel Daranas

22,454
9
63
116

Class, methods, module? I can't understand a word you're saying. – Nosredna Jun 27 '09 at 21:51
1

@Nosredna: You could always ask. – Daniel Daranas Jun 29 '09 at 06:47
increment operators, memcpy's, and pointer casts - do you seriously call that obscure? referenes, copy constructors, templates, and operator overloading, now that's obscure! – James Morris Oct 26 '09 at 18:56
1

I mentioned "increment operators" around as a way of saying that there's a mixture of queries and commands in the same instruction. And memcpy's are too low level, hence obscure. It's the result of the typical low level salad that messes around with byte operation and several value modifications at the same time that I call, as a result, obscure. You can write really hard to maintain code which, read line by line, is clear what it "does", but nobody knows really why. – Daniel Daranas Oct 26 '09 at 21:54

score 3 · Answer 12 · answered Jun 25 '09 at 17:39

3

Your first method is preferable, IMHO. Why obfuscate the issue? Bit fiddling is a really basic thing. C did it right. Endianess doesn't matter. The only thing the union solution does is name things. 11 might be mysterious, but #defined to a meaningful name or enum'ed should suffice.

Programmers who can't handle fundamentals like "|&^~" are probably in the wrong line of work.

answered Jun 25 '09 at 17:39

xcramps

191
1
2

2

-1: "like '|&^~' are probably in the wrong line of work.". Although a C programmer myself ... I can appriciate bitwise operations being confusing for programmers of higher-languages. Doesn't make them any less of a programmer than an Assembly guru not understanding OOP. – Aiden Bell Jun 25 '09 at 18:35
7

@Aiden - I disagree. No matter what kind of programmer you are, even if you spend your entire day writing web apps with functional scripting meta-languages, if you don't understand bits and bytes you're missing the fundamental underpinnings of what is happening. @xcramps, endianness absolultely matters (especially for anything using a network) if you're dealing with anything larger than a byte, like the op's example which dealt with 32-bit ints. – Bob Somers Jun 26 '09 at 07:50

score 2 · Answer 13 · edited Sep 13 '22 at 10:05

2

When I google for "C operators", the first three pages are:

..so I think that argument about people new to the language is a little silly.

edited Sep 13 '22 at 10:05

Peter Mortensen

30,738
21
105
131

answered Jun 25 '09 at 17:57

San Jacinto

8,774
5
43
58

1

I agree wholeheartedly. I believe twenty minutes is enough to get a pretty good grasp on bitwise operators. – Adrian Panasiuk Jun 25 '09 at 18:25
The second and third link are broken, *"502 Bad Gateway"* and *"Forbidden. You don't have permission to access this resource."*, respectively. – Peter Mortensen Sep 13 '22 at 10:06

score 2 · Answer 14 · answered Jun 26 '09 at 09:27

I nearly always use the logical operations with a bit mask, either directly or as a macro. e.g.

 #define  ASSERT_GPS_RESET()                    { P1OUT &= ~GPS_RESET ; }

incidentally your union definition in the original question would not work on my processor/compiler combination. The int type is only 16 bits wide and the bitfield definitions are 32. To make it slightly more portable then you would have to define a new 32 bit type that you could then map to the required base type on each target architecture as part of the porting exercise. In my case

typedef   unsigned long int     uint32_t

and in the original example

typedef unsigned int uint32_t

typedef union {
        struct {
                boolean_t user:1;
                boolean_t zero:1;
                boolean_t force:1;
                int :28;                /* unused */
                boolean_t compat:1;     /* bit 31 */
        };
        uint32_t raw;
} flags_t;

The overlaid int should also be made unsigned.

score 2 · Answer 15 · answered Jun 27 '09 at 15:48

Well, I suppose that's one way of doing it, but I would always prefer to keep it simple.

Once you're used to it, using masks is straightforward, unambiguous and portable.

Bitfields are straightforward, but they are not portable without having to do additional work.

If you ever have to write MISRA-compliant code, the MISRA guidelines frown on bitfields, unions, and many, many other aspects of C, in order to avoid undefined or implementation-dependent behaviour.

score 1 · Answer 16 · answered Jun 25 '09 at 16:21

1

Generally, the one that is easier to read and understand is the one that is also easier to maintain. If you have co-workers that are new to C, the "safer" approach will probably be the easier one for them to understand.

answered Jun 25 '09 at 16:21

Mike

156
2
4

I look at the two pieces of code, and to me the first one looks easier to understand. The second one may well be safer. – Nosredna Jun 27 '09 at 21:39
Personally, I agree with you. However, maybe it's just me, but I have run into a lot of "C experts" who have no knowledge of what | or & do. – Mike Jun 28 '09 at 16:31

score 1 · Answer 17 · 2016-08-30T00:43:54.950

Bitfields are great, except that the bit manipulation operations are not atomic, and can thus lead to problems in multi-threaded application.

For example one could assume that a macro:

#define SET_BIT(val, bitIndex) val |= (1 << bitIndex)

Defines an atomic operation, since |= is one statement. But the ordinary code generated by a compiler will not try to make |= atomic.

So if multiple threads execute different set bit operations one of the set bit operation could be spurious. Since both threads will execute:

  thread 1             thread 2
  LOAD field           LOAD field
  OR mask1             OR mask2
  STORE field          STORE field

The result can be field' = field OR mask1 OR mask2 (intented), or the result can be field' = field OR mask1 (not intented) or the result can be field' = field OR mask2 (not intended).

Re *"Defines an atomic operation, since |= is one statement"*: Is that guarantee? Isn't it entirely dependent on the underlying implementation (e.g., hardware)? — Peter Mortensen, Sep 13 '22 at 10:10

score 0 · Answer 18 · answered Feb 25 '20 at 05:02

I'm not adding much to what's already been said, except to emphasize two points:

The compiler is free to arrange bits within a bitfield any way it wants. This mean if you're trying to manipulate bits in a microcontroller register, or if you want to send the bits to another processor (or even the same processor with a different compiler), you MUST use bitmasks.

On the other hand, if you're trying to create a compact representation of bits and small integers for use within a single processor, bitfields are easier to maintain and thus less error prone, and -- with most compilers -- are at least as efficient as manually masking and shifting.

Bitfield manipulation in C

18 Answers18

Linked

Related