How to handle irregular bit sizes

Question

I am working on a project that contains variables of varying different bit sizes from a binary file. For example one line of the file (in hex) may look like "FF C0 AA 00 FE". From this line, for example, the information I need is 4 bits, 7 bits, 11 bits, 8 bits etc. The problem I am having is some of the numbers extracted will be signed and others will be unsigned (4 and 7 bits may be signed, 11 and 8 bits unsigned).

I was originally extracting by masking and shifting the Hex values to obtain a c++ char/short/int of 4, 7, 11, 8 bits. However if I look at the 4 bit in binary it would show up as 00001011. This number should be a negative based off the leading 1 (should only be the 4 bits 1011), but C++ recognizes it as positive since it is looking at all 8 bits.

Another example for clarification, I might extract 11 bits from the file to be (11100101101) but in c++ short format it appears as (0000011100101101), should be signed based of leading 1 in the 11 bits.

I was wondering what an ideal way to handle this would be. I was considering making a bit/byte class, the only problem is with the varying bit sizes (4,7,11,8).

Thanks, hopefully it makes sense. I am fairly new to binary in C++, so there may be a built in function I havent seen.

"FF C0 AA 00 FE" is a sequence of 4bit, 7bit, 11bit, 8bit,...data? — Ben, Jul 22 '14 at 16:15
What about using a [bitfield](http://en.cppreference.com/w/cpp/language/bit_field) structure? Or a [`std::bitset`](http://en.cppreference.com/w/cpp/utility/bitset) seems also useful for your case. — πάντα ῥεῖ, Jul 22 '14 at 16:17
If FF C0 AA 00 FE is from the file, in binary it appears as (11111111 1100000 10101010...) then I need to take the first 4 bits as one number, so 1111. Then the next number is 7 bits and it would 1111110 (4 bits from FF, 4 from C0), 11 bits would be 00000101010, etc... — user2840470, Jul 22 '14 at 16:20

Mark Ransom · Accepted Answer · 2014-07-22T16:26:28.637

4

I will assume you can already extract the bits you desire by shifting and masking and or-ing together different pieces, and the problem is just to handle the sign bit.

int make_signed(int value, int bits)
{
    if (value & (1 << (bits - 1)))
        value |= -1 & ~((1 << bits) - 1);
    return value;
}

This starts by testing what would be the sign bit in your n-bit number. If it's set, the upper bits of the int are also set by starting with a value of -1 (all bits set in two's complement) and masking off the bottom significant bits.

edited Jul 22 '14 at 16:26

answered Jul 22 '14 at 16:20

Mark Ransom

299,747
42
398
622

+1, but you can quite simply just shift twice to get sign extension (assuming a signed type for `value`): `value = value << (bits - 1) >> (bits - 1)`. EDIT: Actually, scratch that, that would only work in environments where the right-shift is an arithmetic one, which is implementation specific. – Cameron Jul 22 '14 at 16:24
@Cameron thanks for pointing out that I missed something very important. I'm sticking with my original formula with one tiny mod. – Mark Ransom Jul 22 '14 at 16:27
Just out of curiosity: Couldn't he just reset the sign bit and multiply the value by -1? – Not a real meerkat Jul 22 '14 at 16:39
@cassiorenan No, because it wouldn't be the same value. For example the 4-bit sequence `1011` should be -5, not -3. – Mark Ransom Jul 22 '14 at 16:44
Of course. I was thinking completely wrong. More like multiply by -1, clear all "overflowing" MSB bits, multiply by -1. – Not a real meerkat Jul 22 '14 at 16:54
Thanks Mark, rather simple implementation (if you know what you are doing) that handles my problem – user2840470 Jul 22 '14 at 16:59

score 1 · Answer 2 · edited Jul 22 '14 at 16:19

1

This has been done for you - see std::bitset - Just have an array of them

edited Jul 22 '14 at 16:19

Dale Wilson

9,166
3
34
52

answered Jul 22 '14 at 16:17

Ed Heal

59,252
17
87
127

1

std::bitset does not address the users's needs. He is trying to store small integers, not collections of bits. – Dale Wilson Jul 22 '14 at 16:21

barak manos · Answer 3 · 2014-07-22T16:31:16.870

Here is how you can achieve your purpose for the two examples given in your question:

char src = 0x0B;              // 00001011
char dst = (char)(src<<4)>>4; // 11111011

short src = 0x072D;             // 0000011100101101
short dst = (short)(src<<5)>>5; // 1111111100101101

In general, you can implement a function for signed values and a function for unsigned values:

#include <limits.h>

signed int GetSignedVal(signed int val,int numOfBits)
{
    int shift = sizeof(val)*CHAR_BITS-numOfBits;
    return (val<<shift)>>shift;
}

unsigned int GetUnsignedVal(unsigned int val,int numOfBits)
{
    int shift = sizeof(val)*CHAR_BITS-numOfBits;
    return (val<<shift)>>shift;
}

score 0 · Answer 4 · edited May 23 '17 at 12:21

There is a rarely-used feature in C/C++ called a bit field that addresses your problem.

    struct HodgePodgge {
            bool oneBooleanBit:1;
            int fourBitsOfSignedInteger:4;
            unsigned int sixUnsignedIntegerBits:6;
    };

'sizeof(HodgePodge)' should be 2 (11 bits total fits in two bytes)

The compiler generates code to do the shifting, masking, and sign extensions for you. This is a good thing because it does NOT guarantee the layout of the bits within HodgePodge.

This link gives details

Bit fields have been there forever but nobody uses them. Beware you may have to education your coworkers (or comment appropriately)

How to handle irregular bit sizes

4 Answers4