96-bit long bitfield with non-octet aligned subfields

Question

I need a 96-bit long structure that I can place custom bit fields into. The fields' lengths are all over the place, 8, 3, 26, 56. It's important that they remain these exact lengths (with one exception, see below).

I see numerous ways of concatenating the data into a single, compact field: std::bitset, structs (holding the fields contiguously), and of course just using ints. However:

The bitset approach is problematic, because operations need to happen really fast: bitset doesn't provide a method to instantly set range (x..y) out of the whole range (0..96), with one atomic operation. Damned if I'm going to loop to set individual bits.
The struct approach is problematic, because of this restriction on length.
The int approach is problematic, because int64_t is not long enough. I can of course use an int32_t alongside this, however see below.

One solution that is obvious is to put the 56 + 8 fields into an int64_t, and the rest into an int32_t. The problem here is that the 56-long field is the only one which may in fact be reduced later on in development, and that will mean I will have some spare bits in the int64_t, and some 32 - (26 + 3) = 3 spare bits in the int32_t.

Are there any ways to store these as compactly as possible (from a code standpoint) while still being able to access broad areas by masking (unlike std::bitset)?

Write a clean encapsulating interface, then you can change it later without hassle. — Kerrek SB, Nov 15 '11 at 13:52
why can't you use a struct with bitfields? do they have to be consecutive for some reason? `struct A { int64_t f : 56; int x: 8; };` etc. would work just fine when your compiler has a packing extension... — PlasmaHH, Nov 15 '11 at 13:55
Have you tried the `bitset` option and checked the assembly output? It may be that GCC actually optimizes the loop away. — Fred Foo, Nov 15 '11 at 13:56
erm, the lowly `bitset` supports binary operators, so for example to set a range of say 10 bits, you can contruct it with some value which has those 10 bits set, shift it to the appropriate place and/or/xor etc. — Nim, Nov 15 '11 at 14:01

score 3 · Accepted Answer · answered Nov 15 '11 at 16:34

3

Ok, you have a classic size vs speed situation here. I'm going to ask, is this a situation where every bit does matter? Is it that big of a deal if a couple of bits are not quite used? The C coder in me likes either an array of 3 32-bit values, or the 64-bit, 32-bit value approach. The optimizer in me doesn't like the fact that 96-bit data structures are not completely cache friendly and would rather be padded to 128-bits, or at least not accessed across 4-byte boundaries as much as possible.

Using a 64-bit value, depending on your target platform, allows masking of the entire 56-bit entry in 1 instructions, while the 32-bit version would require at least 2 operations. But if you could get that value down to 32-bits (or up to 64-bits), then, no masking at all and full speed ahead provided you keep that 64-bit value on 64-bit address boundaries. Some targets will allow you to access the data non-align at a penalty whereas others will actually throw exceptions.

The safest way is the array of 3 32-bit values. Your alignment is guaranteed, the math can be kept simple as long as you don't span 32-bit boundaries with your bitfields, and it will be the most portable. If you must span a boundary, you will take a speed hit with extra masking and shifting. But, and this is the big question, are your really, Really sure that accessing this data is a speed concern? You have a profile in hand showing that this is a bottleneck? If not, I'd just go with the bitfield C++ solution and call it good. Safer and easier to use is pretty much always a win.

answered Nov 15 '11 at 16:34

Michael Dorgan

12,453
3
31
61

The overall data structure totals 384 bits (48 bytes), because there are additionally 9 mandatory 32-bit pointers. Short of implementing shorter pointers, it won't get any smaller, so it's the best 16-byte aligned structure I'm able to do at this stage. Re a 32 and a 64, I'm still considering it. Re operations; I was actually wondering if it's just a fact of life that I will need 2 operations minimum for the 32-bit version, I actually just got an answer on that [here](http://stackoverflow.com/questions/8138849/how-to-work-with-bitfields-longer-than-64-bits/8138976#comment9987869_8138976). – Engineer Nov 15 '11 at 16:46
You've got me thinking about this further. I've decided to go for the original 32/64 approach. There will be one boundary crossing (for the 26 bit field) but that field itself is a major optimisation that cannot be foregone (precalculated data that otherwise would have to be recalculated every frame). So it's _well_ worth the cost. Also , you said, "provided you keep that 64-bit value on 64-bit address boundaries". Is there any way that wouldn't happen, given I'd simply be declaring the class as 9 pointers, 1 32-bit, and 1 64-bit int? – Engineer Nov 15 '11 at 17:14
Separating them into 2 arrays? Nope, that guarantees alignment as long as your compiler doesn't suck :) One gotcha with this approach though is 2 arrays means 2 locations and at least 2 cache pages. Access across the page will be slower than you think because of this. Locality is #1 for performance once your algorithm is chosen correctly. Just to be sure, you means val32[9]; val64[9]; in your class, not an array of the classes themselves, which would not be 64-bit aligned without compiler help (and padding added). – Michael Dorgan Nov 15 '11 at 17:48
Ah, no, misunderstanding there. No arrays at all. Class members are `int32_t flagsA`, `int64_t flagsB`, and the 9x 32-bit pointers. That should be aligned, I believe? Doing a sizeof, I get 48 bytes, i.e. there is no padding at all. GCC 4.6.1, btw, default compiler options and no fancy aligment stuff specified anywhere. – Engineer Nov 15 '11 at 18:03

score 2 · Answer 2 · answered Nov 15 '11 at 14:01

2

uint32_t bits[3];

There, 96 bits, in a POD type which you can poke around with as much as you like.

Of course, if you want speed, it's very likely that not using a single bit field would speed things up. But if you do want to pack your data at this level, and the std::bitset interface is too constraining, a simple array is what I'd use.

answered Nov 15 '11 at 14:01

jalf

243,077
51
345
550

Thanks jalf, this could be the perfect solution. I'll check it out and come back here when I have. I've not done this sort of thing in C++, so it's not necessarily as obvious to me that this will work, as it might be to others! This is for an octree for volume rendering, so yes, the packing is absolutely necessary due to the sheer volume of data. – Engineer Nov 15 '11 at 14:24

score 2 · Answer 3 · answered Nov 15 '11 at 14:10

2

As I said in my comment, use bitset, it has all the binary operators you need, for example:

std::bitset<96> foo(0xFF1); // set bit 1 & bits 5-12

// to test
std::bitset<96> mask(0xFF0); // are bits 5-12 set?
// test, yes the global & operator will create a temporary
if ((mask & foo) == mask)
{
  // do stuff...
}

answered Nov 15 '11 at 14:10

Nim

33,299
2
62
101

+1 Thank you for opening up this avenue. I like bitsets because they're cleanly encapsulated. What is problematic is how much time std containers spend thrashing about on the stack (construction and destruction included); I've seen this with vectors. Problem for me is this a *very* data-access intensive application -- a realtime octree raycaster, where the data structure in question is the octree node, traversed millions of times per second during raycasting. So if there is a POD type I can use with ease, I will. I'm going to have to test jalf's solution before I can say for sure. – Engineer Nov 15 '11 at 14:21
If you want speed, do then do this the C way. That is, a chunk of memory with a bunch of defines setup for masking and setting each field. – kbyrd Nov 15 '11 at 14:31
@kbyrd -- Can you provide a reference showing this approach? How exactly is the "chunk" defined? – Engineer Nov 15 '11 at 14:33
1

This is traditional C pointer and array stuff. See the answer from @jalf. He just allocated an array on the stack, but you could do it with malloc or whatever. – kbyrd Nov 15 '11 at 14:36
@kbyrd, I assumed as jalf's, but wasn't sure. Thanks. I actually had already found the define approach you mention, yesterday: http://www.coranac.com/documents/working-with-bits-and-bitfields/ – Engineer Nov 15 '11 at 14:43
About speed. You can look at bitset file. For example, stl that go with gcc, define bitset as structure with array inside. So no heap allocation, actually it is as efficient as suggested solution on old plain C, but with syntax sugar. – fghj Nov 15 '11 at 15:12

96-bit long bitfield with non-octet aligned subfields

3 Answers3