How to safely offset bits without undefined behaviour?

Question

I'm writting a function that will convert a bitset to a int/uint value considering that the bitset could have fewer bits than target type.

Here is the function I wrote:

template <typename T,size_t count> static T convertBitSetToNumber( const std::bitset<count>& bitset )
{
    T result;
    #define targetSize (sizeof( T )*CHAR_BIT)
    if ( targetSize > count )
    {
        // if bitset is 0xF00, converting it as 0x0F00 will lose sign information (0xF00 is negative, while 0x0F00 is positive)
        // This is because sign bit is on the left.
        // then, we need to add a zero (4bits) on the right and then convert 0xF000, later, we will divide by 16 (2^4) to preserve sign and value

        size_t missingbits = targetSize - count;

        std::bitset<targetSize> extended;
        extended.reset(); // set all to 0
        for ( size_t i = 0; i != count; ++i )
        {
            if ( i < count )
                extended[i+missingbits] = bitset[i];
        }

        result = static_cast<T>( extended.to_ullong() );

        result = result >> missingbits;

        return result;
    }
    else
    {
        return static_cast<T>( bitset.to_ullong() );
    }
}

And the "test program":

uint16_t val1 = Base::BitsetUtl::convertBitSetToNumber<uint16_t,12>( std::bitset<12>( "100010011010" ) );
// val1 is 0x089A
int16_t val2 = Base::BitsetUtl::convertBitSetToNumber<int16_t,12>( std::bitset<12>( "100010011010" ) );
// val2 is 0xF89A

Note: See comment/exchange with Ped7g, he code above is right and preserves bit sign and does the 12->16bits conversion right for signed or unsigned bits. But if you are looking on how to offset 0xABC0 to 0x0ABC on a signed object, the answers could help you, so I don't delete the question.

See the program works when using uint16 as target type, as:

uint16_t val = 0x89A0; // 1000100110100000
val = val >> 4;        // 0000100010011010

However, it fails when using int16_t, because 0x89A0 >> 4 is 0xF89A instead of expected 0x089A.

int16_t val = 0x89A0; // 1000100110100000
val = val >> 4;       // 1111100010011010

I don't understand why >> operator sometimes insert 0 and sometimes 1. And I can't find out how to safely do the final operation of my function (result = result >> missingbits; must be wrong at some point...)

Can you make `#define targetSize (sizeof( T )*CHAR_BIT)` less evil? What's wrong with a `constexpr`? — , Sep 22 '16 at 14:18
@FaceyMcFaceFace: Sure I can, But I doubt that will solve my problem...;-) — jpo38, Sep 22 '16 at 14:22
"if bitset is 0xF00, converting it as 0x0F00 will lose sign information" ... that's very weird usage of bitset. Bitset is designed to be used more like flag-per-bit, so having one bit as "sign" flag and other composing numerical value looks to me like a mild abuse of bitset, you should probably use one of int#_t already at higher level, not having bitset at all. — Ped7g, Sep 22 '16 at 14:36
And now I'm totally confused, when `(count <= targetSize)`, isn't the `return static_cast( bitset.to_ullong() );` returning already your desired value? And in other case the `static_cast( bitset.to_ullong()>>(count-targetSize) );` would do? — Ped7g, Sep 22 '16 at 14:43
@Ped7g: static_cast won't work for signed types: int12 sign bit is not at the right position to have static_cast work fine. — jpo38, Sep 22 '16 at 14:52
So want the sign, or you don't want the sign? Citation from question: "However, it fails when using int16_t, because 0x89A0 >> 4 is 0xF89A instead of expected 0x089A." => 0xF89A looks fine to me, when you want the sign. I think you should probably create some table with 12b bitset to 8/16/signed/unsigned results, what you really expect. BTW I still believe you are slightly misusing the bitset on higher level, and probably your code can be written in simpler way with int types. — Ped7g, Sep 22 '16 at 15:00
@Ped7g: True, 0xF89A is actually what I need, I figured this out when I applied answers below, 0x089A is obviously not the result expected...people focused on the final question ("how to make 0x89A0 shift to 0x089A"), when actually the question was dumb....but they deserve earned reputation, so I don't want to delete the question now....I'll update it. — jpo38, Sep 22 '16 at 15:12

alain · Answer 1 · 2016-09-22T19:48:08.017

4

This is called arithmetic shifting. On signed types, the most significant bit is the sign bit. When you shift a negative value to the right, the upper bits are set to 1, such that the result is still a negative number. (The result is a division by 2ⁿ where n is the number of bits shifted, with rounding towards negative infinity).

To avoid that, use an unsigned type. Shifting them uses logical shifting, which will set the upper bits to 0.

Change this line:

result = result >> missingbits;

to

result = static_cast<T>(static_cast<uintmax_t>(result) >> missingbits);

(uintmax_t is the maximum width unsigned integer type supported by the compiler)

or use std::make_unsigned as Joachim Pileborg wrote in his answer.

edited Sep 22 '16 at 19:48

answered Sep 22 '16 at 14:21

alain

11,939
2
31
51

How can I `use an unsigned type` within my function when the function is called with a `signed type`? – jpo38 Sep 22 '16 at 14:26
you could cast it to a large enough unsigned type, do the shifting and cast it back. – alain Sep 22 '16 at 14:28
How can I do this in a generic way? To have my function work with any type (int8, int16, int32, int64) as it is meant to? – jpo38 Sep 22 '16 at 14:31
I think the largest type available would be `uint64_t`, that should work for all of `int64_t` down to `int8_t`. – alain Sep 22 '16 at 14:33
@jpo38 actually `uintmax_t` is the largest unsigned integer type supported by the compiler, updated my answer. – alain Sep 22 '16 at 15:12

Some programmer dude · Accepted Answer · 2016-09-22T14:35:52.457

4

It's because shifting is an arithmetic operation, and that promotes the operands to int, which will do sign extension.

I.e. promoting the signed 16-bit integer (int16_t) 0x89a0 to a 32-bit signed integer (int) causes the value to become 0xffff89a0, which is the value that is shifted.

See e.g. this arithmetic operation conversion reference for more information.

You should cast the variable (or value) to an unsigned integer (i.e. uint16_t in your case):

val = static_cast<uint16_t>(val) >> 4;

If the type is not really know, like if it's a template argument, then you can use std::make_unsigned:

val = static_cast<typename std::make_unsigned<T>::type>(val) >> 4;

edited Sep 22 '16 at 14:35

answered Sep 22 '16 at 14:23

Some programmer dude

400,186
35
402
621

Good, but my function is template, `uint16_t` `static_cast` will mess things up if the function is invoked with T = int32_t...I need a more generic solution and could not find any. Is there a way to pickup signed type from an unsigned one within a template function? – jpo38 Sep 22 '16 at 14:32
Use `std::make_unsigned`. Updated answer. – Some programmer dude Sep 22 '16 at 14:36
upvoted for `std::make_unsigned`, but int promotion is not the important point since when you shift an `int`, no promotion happens but the upper bits are still set to 1. – alain Sep 22 '16 at 14:46
@alain The OP is using an `int16_t` which on most platforms is the same as `short` which will get promoted to `int`. – Some programmer dude Sep 22 '16 at 15:30

A. Sarid · Answer 3 · 2016-09-22T14:55:17.023

As already said, when your type is signed the >> operator is performing an arithmetic shifting. So in addition to the solution suggested above, in case you need to do logical shifting you can always simply use a mask as below:

    int mask = 1 << (targetSize-missingbits-1);
    mask |= mask - 1;
    result = (result >> missingbits) & mask;

In this case the mask will give you missingbits MSB's 0 and the rest 1. In your case 4 MSB's will be 0 and the rest 1. Then, performing an & operation will reset the first missingbits in your result and that is what you need:

0xF89A & 0x0FFF = 0x089A

See it working live-example.

score 1 · Answer 4 · answered Sep 22 '16 at 15:53

That original code with loop looks to me a bit complicated to me, I would wrote it like this (I mean as the second option, after I would somehow inexplicably fail to avoid usage of std::bitset and templates completely, for such simple thing as bit size adjustment of data, in the first place):

#include <bitset>
#include <climits>

template <typename T,size_t count> static T convertBitSetToNumber( const std::bitset<count>& bitset )
{
    constexpr size_t targetSize = sizeof( T )*CHAR_BIT;
    if (targetSize == count) return static_cast<T>(bitset.to_ullong());
    if (targetSize < count) return static_cast<T>(bitset.to_ullong() >> (count - targetSize));
    return static_cast<T>(bitset.to_ullong() << (targetSize - count)) >> (targetSize - count);
}

// Example test producing from 0x089A bitset unsigned/signed values:
// 16b: 89a f89a | 8b: 89 89 | 32b: 89a fffff89a

#include <iostream>

int main()
{
    const std::bitset<12> testBitset("100010011010");
    std::hex(std::cout);
    std::cout << convertBitSetToNumber<uint16_t,12>( testBitset ) << std::endl;
    std::cout << convertBitSetToNumber<int16_t,12>( testBitset ) << std::endl;
    std::cout << (0xFF & convertBitSetToNumber<uint8_t,12>( testBitset )) << std::endl;
    std::cout << (0xFF & convertBitSetToNumber<int8_t,12>( testBitset )) << std::endl;
    std::cout << convertBitSetToNumber<uint32_t,12>( testBitset ) << std::endl;
    std::cout << convertBitSetToNumber<int32_t,12>( testBitset ) << std::endl;
}

Right, your code behaves exactly as mine (passed all my tests) and is actually simplier. Thanks! — jpo38, Sep 23 '16 at 06:17
When I compile your code, I get warning ` warning C4293: '>>': shift count negative or too big, undefined behavior`. So it this code really safe? — jpo38, Oct 11 '16 at 06:52
@jpo38 well, the negative shift obviously can't happen at all. So the "too big" shift remains... depends a lot on target size, and `bitset` count. I can imagine scenarios where you can break this code by using too big bitset, or some huge type, like maybe `bitset<1>` vs `uint64_t`. When in doubt, test it (that's why I added tests in `main`). But I don't think the compiler can do static analysis to that extent, so the warning may be false as well. It certainly should work at least as good as original code. — Ped7g, Oct 11 '16 at 07:09
@jpo38 I think the main source of warning is shifting by `size_t` value. Anyway, as long as you keep the target/bitset difference under 32, this is 100% safe. (under 64 on 64b systems should be safe too, but difference under 32 should work everywhere). In your question the 12 bit bitset vs 16 bit target type is safe, that's only 4 bits difference. — Ped7g, Oct 11 '16 at 07:22

How to safely offset bits without undefined behaviour?

4 Answers4