3

I have the following UINT8 variables:

UINT8 var1 = 0b00000001; //0000 0001
UINT8 var2 = 0b00000011; //0000 0011
UINT8 var3 = 0b00000111; //0000 0111
UINT8 var4 = 0b00001111; //0000 1111

I would like to pack these four UINT8 variables into one UINT32 variable with the following value:

UINT32 var1 = 0b00000001000000110000011100001111; //00000001 00000011 00000111 00001111

Would the following code do it correctly and safely?

UINT32 var1 = (var1<<24) + (var2<<16) + (var3<<8) + var4;
M. A. Kishawy
  • 5,001
  • 11
  • 47
  • 72

1 Answers1

10

Short answer, yes.

I'm not going to worry about how you wrote your binary numbers. I will enter them in hex and let you look for binary representations by this related SO question: Can I use a binary literal in C or C++?

#include "stdafx.h"   // you are using devstudio
#include <Windows.h>  // you are using windows types
#include <iostream>   // I print out the result
#include <bitset>     // I use bitset to print the binary string

int main()
{
    UINT8 var1 = 0x01; //0000 0001
    UINT8 var2 = 0x03; //0000 0011
    UINT8 var3 = 0x07; //0000 0111
    UINT8 var4 = 0x0F; //0000 1111

    UINT32 bigvar = (var1 << 24) + (var2 << 16) + (var3 << 8) + var4;
    std::cout << std::bitset<32>(bigvar) << std::endl;
}

Your math is correct and safe. The bytes are independently declared, so you don't have to worry about byte order. The types are all unsigned, so no UB issues with the sign bit. The shifts all fit in the correct bit count, so no overflow. I generated:

00000001000000110000011100001111

Alternatively, you could have read in a 32 bit integer as 4 bytes, and reconstructed the 32 bit number, but that would not be portable, because sometimes the numbers are stored in reverse order. For example, in TIFF, you read in a header value which tells you whether you would put var1 first and count up, or var4 first and count down. Byte order is something you have to watch out for in almost all practical applications of combining a bunch of bytes into a larger integer type. Look up big-endian and little-endian for more info.

Community
  • 1
  • 1
Kenny Ostrom
  • 5,639
  • 2
  • 21
  • 30
  • 4
    *"Your math is correct and safe."* No it's not. `var1 << 24` is potentially UB. – Baum mit Augen Oct 11 '16 at 14:15
  • 1
    I will have to look into that. I have production code similar to that in a tiff manipulation library. When can it fail? – Kenny Ostrom Oct 11 '16 at 14:18
  • 2
    It invokes UB for all `var1 > 127` by shifting onto the sign bit (given `int` is 32 bit on your platform). – Baum mit Augen Oct 11 '16 at 14:19
  • 1
    Even though they are all unsigned? – Kenny Ostrom Oct 11 '16 at 14:20
  • 3
    `24` is signed. – Baum mit Augen Oct 11 '16 at 14:20
  • 1
    @BaummitAugen isn't the problem rather that `var1 << 24` gets extended to the fitting type , which might or might not be signed, so that the `+` operation is a bad idea? – Marcus Müller Oct 11 '16 at 14:23
  • 1
    No. Nothing gets "extended" here. `var1 << 24` is of type `int` (unless `sizeof(int) == 1`). But enough of the comment chat I guess, if no one else comes around I will look for a dupe or write a complete and correct answer tonight. – Baum mit Augen Oct 11 '16 at 14:26
  • I would appreciate if you can support that. I have stable code using that exact same code on tiff format and reading pgp encryption headers, on windows and posix (going back 15-20 years on the tiff data). If there's any risk here, then I will hunt down the issue and change it just in case some future compiler doesn't work the same. Also, I like harold's comment above about using bitwise or. – Kenny Ostrom Oct 11 '16 at 14:33
  • From http://stackoverflow.com/questions/8713490/why-result-of-unsigned-char-unsigned-char-is-not-unsigned-char the values are promoted to a UINT32 first, then shifted, then addd; which is not UB – UKMonkey Oct 11 '16 at 14:49
  • 1
    5.8.2 [expr.shift] Looks like I'm safe. There is a section about UB, but that only applies if the first operand is signed (and some other conditions). – Kenny Ostrom Oct 11 '16 at 14:54
  • 2
    Turns out I was wrong, the result is merely implementation defined, and on most platforms (namely, those that implement 2s complement integer math like gcc, clang, icc, MSVC..) we will always get the right result. However, there is more going on than you realize. For starters, `var1 << 24` is equivalent to `int(var1) << 24`, i.e. your LHS *is* signed, but 5.8/2 fixes the overflow for us. If you want to see the details, feel free to ask a new question. The answer to the question the Monkey linked explains too. Anyways, downvote removed. – Baum mit Augen Oct 11 '16 at 21:25