6

I have 3 unsigned bytes that are coming over the wire separately.

[byte1, byte2, byte3]

I need to convert these to a signed 32-bit value but I am not quite sure how to handle the sign of the negative values.

I thought of copying the bytes to the upper 3 bytes in the int32 and then shifting everything to the right but I read this may have unexpected behavior.

Is there an easier way to handle this?

The representation is using two's complement.

phuclv
  • 37,963
  • 15
  • 156
  • 475
Beto
  • 806
  • 3
  • 12
  • 33

7 Answers7

11

You could use:

uint32_t sign_extend_24_32(uint32_t x) {
    const int bits = 24;
    uint32_t m = 1u << (bits - 1);
    return (x ^ m) - m;
}

This works because:

  • if the old sign was 1, then the XOR makes it zero and the subtraction will set it and borrow through all higher bits, setting them as well.
  • if the old sign was 0, the XOR will set it, the subtract resets it again and doesn't borrow so the upper bits stay 0.

Templated version

template<class T>
T sign_extend(T x, const int bits) {
    T m = 1;
    m <<= bits - 1;
    return (x ^ m) - m;
}
harold
  • 61,398
  • 6
  • 86
  • 164
  • 1
    Another benefit of bit-twiddling in this way is that you're not limited to a 32-bit int - it works just as well on a 64-bit int for example. I'd change the type, perhaps to a template parameter, and make `bits` a function parameter as well. – Mark Ransom Mar 01 '17 at 15:54
  • @MarkRansom good points, is that approximately what you meant? – harold Mar 01 '17 at 16:02
  • I need a signed 32 not unsigned though – Beto Mar 01 '17 at 16:04
  • @Beto you can just use signed types here, at least I see no way for it to break (unless `bits` is something unreasonable). Makes the rest of the code more dangerous though. – harold Mar 01 '17 at 16:14
  • 1
    Perfect. I like the way you split `m` assignment into two parts to make sure the shifting occurs on the proper type. – Mark Ransom Mar 01 '17 at 16:37
2

Assuming both representations are two's complement, simply

upper_byte = (Signed_byte(incoming_msb) >= 0? 0 : Byte(-1));

where

using Signed_byte = signed char;
using Byte = unsigned char;

and upper_byte is a variable representing the missing fourth byte.

The conversion to Signed_byte is formally implementation-dependent, but a two's complement implementation doesn't have a choice, really.

Cheers and hth. - Alf
  • 142,714
  • 15
  • 209
  • 331
1

You could let the compiler process itself the sign extension. Assuming that the lowest significant byte is byte1 and the high significant byte is byte3;

int val = (signed char) byte3;                // C guarantees the sign extension
val << 16;                                    // shift the byte at its definitive place
val |= ((int) (unsigned char) byte2) << 8;    // place the second byte
val |= ((int) (unsigned char) byte1;          // and the least significant one

I have used C style cast here when static_cast would have been more C++ish, but as an old dinosaur (and Java programmer) I find C style cast more readable for integer conversions.

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
1

This is a pretty old question, but I recently had to do the same (while dealing with 24-bit audio samples), and wrote my own solution for it. It's using a similar principle as this answer, but more generic, and potentially generates better code after compiling.

template <size_t Bits, typename T>
inline constexpr T sign_extend(const T& v) noexcept {
    static_assert(std::is_integral<T>::value, "T is not integral");
    static_assert((sizeof(T) * 8u) >= Bits, "T is smaller than the specified width");
    if constexpr ((sizeof(T) * 8u) == Bits) return v;
    else {
        using S = struct { signed Val : Bits; };
        return reinterpret_cast<const S*>(&v)->Val;
    }
}

This has no hard-coded math, it simply lets the compiler do the work and figure out the best way to sign-extend the number. With certain widths, this can even generate a native sign-extension instruction in the assembly, such as MOVSX on x86.

This function assumes you copied your N-bit number into the lower N bits of the type you want to extend it to. So for example:

int16_t a = -42;
int32_t b{};
memcpy(&b, &a, sizeof(a));
b = sign_extend<16>(b);

Of course it works for any number of bits, extending it to the full width of the type that contained the data.

notadam
  • 2,754
  • 2
  • 19
  • 35
0

You can use a bitfield

template<size_t L>
inline int32_t sign_extend_to_32(const char *x)
{
  struct {int32_t i: L;} s;
  memcpy(&s, x, 3);
  return s.i;
  // or
  return s.i = (x[2] << 16) | (x[1] << 8) | x[0]; // assume little endian
}

Easy and no undefined behavior invoked

int32_t r = sign_extend_to_32<24>(your_3byte_array);

Of course copying the bytes to the upper 3 bytes in the int32 and then shifting everything to the right as you thought is also a good idea. There's no undefined behavior if you use memcpy like above. An alternative is reinterpret_cast in C++ and union in C, which can avoid the use of memcpy. However there's an implementation defined behavior because right shift is not always a sign-extension shift (although almost all modern compilers do that)

phuclv
  • 37,963
  • 15
  • 156
  • 475
  • Placing a value in a bit field so small that the extracted value is not equal, must surely be implementation-defined behavior. Still I like this. :) – Cheers and hth. - Alf Mar 01 '17 at 15:09
  • How do you compile this? I get some "error: address of bit-field requested". Works if I remove that `.i24` in the memcpy, maybe that's what you meant? – harold Mar 01 '17 at 15:34
  • @harold yes. This was made up without compiling – phuclv Mar 01 '17 at 15:43
0

Here's a method that works for any bit count, even if it's not a multiple of 8. This assumes you've already assembled the 3 bytes into an integer value.

const int bits = 24;
int mask = (1 << bits) - 1;
bool is_negative = (value & ~(mask >> 1)) != 0;
value |= -is_negative & ~mask;
Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
  • Why so complicated though? You could just `(value ^ m) - m` with `m = 1 << (bits - 1)` – harold Mar 01 '17 at 15:27
  • @harold if you think you have a better answer go ahead and answer the question yourself. I'm having a hard time convincing myself that it works, but if it does you'll get a +1 from me. – Mark Ransom Mar 01 '17 at 15:40
  • Fair enough, I just thought maybe there's a reason for it – harold Mar 01 '17 at 15:47
0

Assuming your 24bit value is stored in variable int32_t val, you can easily extend the sign by following:

val = (val << 8) >> 8;
anicic
  • 1
  • 1