In the following code:
short = ((byte2 << 8) | (byte1 & 0xFF))
What is the purpose of & 0xFF
? Because sometimes, I see the above code written as:
short = ((byte2 << 8) | byte1)
And that seems to work fine too.
In the following code:
short = ((byte2 << 8) | (byte1 & 0xFF))
What is the purpose of & 0xFF
? Because sometimes, I see the above code written as:
short = ((byte2 << 8) | byte1)
And that seems to work fine too.
if byte1
is an 8-bit integer type then it's pointless - if it is more than 8 bits it will essentially give you the last 8 bits of the value:
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
& 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
-------------------------------
0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1
Anding an integer with 0xFF
leaves only the least significant byte. For example, to get the first byte in a short s
, you can write s & 0xFF
. This is typically referred to as "masking". If byte1
is either a single byte type (like uint8_t
) or is already less than 256 (and as a result is all zeroes except for the least significant byte) there is no need to mask out the higher bits, as they are already zero.
See tristopiaPatrick Schlüter's answer below when you may be working with signed types. When doing bitwise operations, I recommend working only with unsigned types.
The danger of the second expression comes if the type of byte1
is char
. In that case, some implementations can have it signed char
, which will result in sign extension when evaluating.
signed char byte1 = 0x80;
signed char byte2 = 0x10;
unsigned short value1 = ((byte2 << 8) | (byte1 & 0xFF));
unsigned short value2 = ((byte2 << 8) | byte1);
printf("value1=%hu %hx\n", value1, value1);
printf("value2=%hu %hx\n", value2, value2);
will print
value1=4224 1080 right
value2=65408 ff80 wrong!!
I tried it on gcc v3.4.6 on Solaris SPARC 64 bit and the result is the same with byte1
and byte2
declared as char
.
TL;DR
The masking is to avoid implicit sign extension.
EDIT: I checked, it's the same behaviour in C++.
EDIT2: As requested explanation of sign extension.
Sign extension is a consequence of the way C evaluates expressions. There is a rule in C called promotion rule. C will implicitly cast all small types to int
before doing the evaluation. Let's see what happens to our expression:
unsigned short value2 = ((byte2 << 8) | byte1);
byte1
is a variable containing bit pattern 0xFF. If char
is unsigned
that value is interpreted as 255, if it is signed
it is -1. When doing the calculation, C will extend the value to an int
size (16 or 32 bits generally). This means that if the variable is unsigned
and we will keep the value 255, the bit-pattern of that value as int
will be 0x000000FF. If it is signed
we want the value -1 which bit pattern is 0xFFFFFFFF. The sign was extended to the size of the tempory used to do the calculation.
And thus or-ing the temporary will yield the wrong result.
On x86 assembly it is done with the movsx
instruction (movzx
for the zero extend). Other CPU's had other instructions for that (6809 had SEX
).
Assuming your byte1
is a byte(8bits), When you do a bitwise AND of a byte with 0xFF, you are getting the same byte.
So byte1
is the same as byte1 & 0xFF
Say byte1
is 01001101
, then byte1 & 0xFF = 01001101 & 11111111 = 01001101 = byte1
If byte1 is of some other type say integer of 4 bytes, bitwise AND with 0xFF leaves you with least significant byte(8 bits) of the byte1.
The byte1 & 0xff
ensures that only the 8 least significant bits of byte1
can be non-zero.
if byte1
is already an unsigned type that has only 8 bits (e.g., char
in some cases, or unsigned char
in most) it won't make any difference/is completely unnecessary.
If byte1
is a type that's signed or has more than 8 bits (e.g., short
, int
, long
), and any of the bits except the 8 least significant is set, then there will be a difference (i.e., it'll zero those upper bits before or
ing with the other variable, so this operand of the or
affects only the 8 least significant bits of the result).
& 0xFF
by itself only ensures that if bytes are longer than 8 bits (allowed by the language standard), the rest are ignored.
And that seems to work fine too?
If the result ends up greater than SHRT_MAX
, you get undefined behavior. In that respect both will work equally poorly.