How do you concatenate 4 UINT8 variables into one UINT32 variable?

Question

I have the following UINT8 variables:

UINT8 var1 = 0b00000001; //0000 0001
UINT8 var2 = 0b00000011; //0000 0011
UINT8 var3 = 0b00000111; //0000 0111
UINT8 var4 = 0b00001111; //0000 1111

I would like to pack these four UINT8 variables into one UINT32 variable with the following value:

UINT32 var1 = 0b00000001000000110000011100001111; //00000001 00000011 00000111 00001111

Would the following code do it correctly and safely?

UINT32 var1 = (var1<<24) + (var2<<16) + (var3<<8) + var4;

Ok, for this one we really need to know the underlying types. But it's most likely broken. — Baum mit Augen, Oct 11 '16 at 13:47
@BaummitAugen what do you mean by the underlying types? I'm storing bytes value in the variables. Is that what you referring to? — M. A. Kishawy, Oct 11 '16 at 13:48
You seem to be under the impression that `00000001` et al are binary literals. Which C++ compiler are you using? — Michael, Oct 11 '16 at 13:49
`UINT8` and `UINT32` are not standard types, so you need to tell us what they are actually defined to be. — Baum mit Augen, Oct 11 '16 at 13:49
@Michael I'm using whatever comes with Microsoft Visual Studio. — M. A. Kishawy, Oct 11 '16 at 13:50
No, it wouldn't, because `var2 = 00000011` has the value nine, not three, because a leading 0 indicates octal notation — Marcus Müller, Oct 11 '16 at 13:50
@BaummitAugen https://msdn.microsoft.com/en-us/library/windows/desktop/aa383751(v=vs.85).aspx — Simon Kraemer, Oct 11 '16 at 13:51
Hinted at by several, but not said outright. As of VS 2015 Preview (and GCC for a while longer), you can write your binary literals as `UINT8 var1 = 0b00000001`. Otherwise, you can't. The easiest might be `UINT8 var2 =3; // 0000 0011`. Other solutions at: http://stackoverflow.com/questions/2611764/can-i-use-a-binary-literal-in-c-or-c — slim, Oct 11 '16 at 13:55
@M.A.Kishawy At some point you need to just try these things and print the results. Also reading through language documentation and such helps (e.g. yes, the left bitshift operator does what it's supposed to, if that's what you're asking here). Btw the correct terminology (in Java as well) is more along the lines of "pack four UINT8's into a UINT32" or something like that, "daisy chain" isn't usually used in this context. — Jason C, Oct 11 '16 at 14:00
@Michael C++14 https://en.wikipedia.org/wiki/C%2B%2B14#Binary_literals — Jason C, Oct 11 '16 at 14:02
@JasonC: The original code posted did not use any prefixes. They were edited in later. — Michael, Oct 11 '16 at 14:09
Just to ping you: I was wrong about this code being wrong. It will work fine on the usual platforms like gcc, clang, MSVC, icc... — Baum mit Augen, Oct 11 '16 at 21:50

score 10 · Accepted Answer · edited May 23 '17 at 12:16

10

Short answer, yes.

I'm not going to worry about how you wrote your binary numbers. I will enter them in hex and let you look for binary representations by this related SO question: Can I use a binary literal in C or C++?

#include "stdafx.h"   // you are using devstudio
#include <Windows.h>  // you are using windows types
#include <iostream>   // I print out the result
#include <bitset>     // I use bitset to print the binary string

int main()
{
    UINT8 var1 = 0x01; //0000 0001
    UINT8 var2 = 0x03; //0000 0011
    UINT8 var3 = 0x07; //0000 0111
    UINT8 var4 = 0x0F; //0000 1111

    UINT32 bigvar = (var1 << 24) + (var2 << 16) + (var3 << 8) + var4;
    std::cout << std::bitset<32>(bigvar) << std::endl;
}

Your math is correct and safe. The bytes are independently declared, so you don't have to worry about byte order. The types are all unsigned, so no UB issues with the sign bit. The shifts all fit in the correct bit count, so no overflow. I generated:

00000001000000110000011100001111

Alternatively, you could have read in a 32 bit integer as 4 bytes, and reconstructed the 32 bit number, but that would not be portable, because sometimes the numbers are stored in reverse order. For example, in TIFF, you read in a header value which tells you whether you would put var1 first and count up, or var4 first and count down. Byte order is something you have to watch out for in almost all practical applications of combining a bunch of bytes into a larger integer type. Look up big-endian and little-endian for more info.

edited May 23 '17 at 12:16

Community

1
1

answered Oct 11 '16 at 14:09

Kenny Ostrom

5,639
2
21
30

4

*"Your math is correct and safe."* No it's not. `var1 << 24` is potentially UB. – Baum mit Augen Oct 11 '16 at 14:15
1

I will have to look into that. I have production code similar to that in a tiff manipulation library. When can it fail? – Kenny Ostrom Oct 11 '16 at 14:18
2

It invokes UB for all `var1 > 127` by shifting onto the sign bit (given `int` is 32 bit on your platform). – Baum mit Augen Oct 11 '16 at 14:19
1

Even though they are all unsigned? – Kenny Ostrom Oct 11 '16 at 14:20
3

`24` is signed. – Baum mit Augen Oct 11 '16 at 14:20
1

@BaummitAugen isn't the problem rather that `var1 << 24` gets extended to the fitting type , which might or might not be signed, so that the `+` operation is a bad idea? – Marcus Müller Oct 11 '16 at 14:23
1

No. Nothing gets "extended" here. `var1 << 24` is of type `int` (unless `sizeof(int) == 1`). But enough of the comment chat I guess, if no one else comes around I will look for a dupe or write a complete and correct answer tonight. – Baum mit Augen Oct 11 '16 at 14:26
I would appreciate if you can support that. I have stable code using that exact same code on tiff format and reading pgp encryption headers, on windows and posix (going back 15-20 years on the tiff data). If there's any risk here, then I will hunt down the issue and change it just in case some future compiler doesn't work the same. Also, I like harold's comment above about using bitwise or. – Kenny Ostrom Oct 11 '16 at 14:33
From http://stackoverflow.com/questions/8713490/why-result-of-unsigned-char-unsigned-char-is-not-unsigned-char the values are promoted to a UINT32 first, then shifted, then addd; which is not UB – UKMonkey Oct 11 '16 at 14:49
1

5.8.2 [expr.shift] Looks like I'm safe. There is a section about UB, but that only applies if the first operand is signed (and some other conditions). – Kenny Ostrom Oct 11 '16 at 14:54
2

Turns out I was wrong, the result is merely implementation defined, and on most platforms (namely, those that implement 2s complement integer math like gcc, clang, icc, MSVC..) we will always get the right result. However, there is more going on than you realize. For starters, `var1 << 24` is equivalent to `int(var1) << 24`, i.e. your LHS *is* signed, but 5.8/2 fixes the overflow for us. If you want to see the details, feel free to ask a new question. The answer to the question the Monkey linked explains too. Anyways, downvote removed. – Baum mit Augen Oct 11 '16 at 21:25

How do you concatenate 4 UINT8 variables into one UINT32 variable?

1 Answers1