4

I was reading a book that implement a do it yourself DNS message reader and it tries to see if a particular field is set true. One piece of code the book uses I can't understand well.

const int qdcount = (msg[10] << 8) + msg[11];

msg is ==> char type (i.e 8 bits)

qdcount ==> is supposed to be a field of 16 bits contains number of DNS queries ( made of 2 fields together msg[10] and msg[11])

so how does this code work (if msg[10] = 01001 0001 for example) left shifting it by 8 is supposed to result (1000 0000) i.e UB, then any calculations done will result in a wrong answer. Suppose msg[11] = 0010 1111. result of calculation is 1000 0000 + 0010 1111 right?. so how this line of code works exactly.

Hulk
  • 6,399
  • 1
  • 30
  • 52
KMG
  • 1,433
  • 1
  • 8
  • 19
  • 1
    Please don't _describe_ the type of `msg` but _show_ the declaration. – Jabberwocky Jul 16 '20 at 10:57
  • before the shift is done the operands are converted to `int` so `10010001` is converted to `00000000 00000000 00000000 10010001` (or `11111111 11111111 11111111 10010001`) and only after that is the left shift performed – pmg Jul 16 '20 at 10:57
  • 1
    see [Bitwise_shift_operators](https://en.cppreference.com/w/cpp/language/operator_arithmetic#Bitwise_shift_operators): "Integral promotions are performed on both operands." – Hulk Jul 16 '20 at 11:02
  • @pmg I cant understand if promotions are done automatically when will this operation result in UB. – KMG Jul 16 '20 at 11:13
  • 2
    Is plain `char` in your machine a signed type or an unsigned type? It matters – Jonathan Leffler Jul 16 '20 at 11:28
  • Left shift invokes UB, basically, if it leads to `int` overflow. `1 << 48` UB on 32-bit machines – pmg Jul 16 '20 at 11:31
  • @pmg: Left-shifting or right-shifting by an amount which isn't less than the number of bits in the type being shifted invokes Undefined Behaivor, whether or not it would cause overflow. C89 defined the behavior of smaller left shifts of negative values, or those that would produce overflow, as sensibly-defined behavior on two's-complement implementations that didn't use padding bits, but as less-sensibly-defined behavior on other platforms. C99 recharacterized such shifts as Undefined Behavior on all platforms. – supercat Jul 16 '20 at 23:11

2 Answers2

3

There are at least 2 independent considerations.

  • char is signed or unsigned?

  • int 16 bit or wider?

    (msg[10] << 8) + msg[11];
    

Most of the concern depends on:

The result of E1 << E2 is E1 left-shifted E2 bit positions ...
If E1 has a signed type and nonnegative value, and E1 *2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

char is signed, int any size

msg[10] is promoted to int and shifted left 8. This is fine when the value msg[10] is positive and undefined behavior when negative as left shifting a negative value is UB.

char is unsigned, int wider than 16 bit

msg[10] is promoted to int and shifted left 8. This is fine, no problems for all msg[10] values.

char is unsigned, int is 16 bit

msg[10] is promoted to int and shifted left 8. This is fine when msg[10] < 128, else is UB to shift into the sign place - positive value not representable.


Best to use unsigned types when shifting.

// char msg[100];
// const int qdcount = (msg[10] << 8) + msg[11];

char unsigned msg[100];
const unsigned qdcount = ((unsigned) msg[10] << 8) + msg[11];
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • if char will always be converted to int What is the point for using unsigned char or signed char what protection does it provide ?. – KMG Jul 16 '20 at 12:02
  • @Khaled Gaber When the `char` value is negative, the left shift is UB. With `unsigned char`, the promoted to `int` value is not negative. As long as the left shift does not shift in the `int` sign bit, all it well. – chux - Reinstate Monica Jul 16 '20 at 12:07
  • @yes got it now i missed that that the sign is most significant bit of the number :) . – KMG Jul 16 '20 at 12:13
  • @KhaledGaber "The integer promotions are performed on each of the operands." does not mean, that the argument is promoted to `int`. It may also be promoted to `unsigned int` or larger types in the required signedness. Please take a look at the topic of "integer promotion" to understand the implications. – vlad_tepesch Jul 16 '20 at 12:37
  • @vlad_tepesch Details: In the case of shift operators, the operands go through the usual integer promotions which make the operands, if narrower than `int` to `int` or `unsigned`, but not to wider types. In the case of `a shift b`, the type of `b` does not affect the the type of `a`. `int << long` results in an `int`. `long << unsigned long` is `long` : "The type of the result is that of the promoted left operand." – chux - Reinstate Monica Jul 16 '20 at 14:14
2

from the C11 – ISO/IEC 9899:2011 draft paragraph 6.5.7.4 version linked here.

[...] E1 << E2 [...]
If E1 has a signed type and nonnegative value, and E1 × 2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

So yes. left shifting a signed value may result in undefined behavior.

Another thing in your code, that actually causes your confusion is:

The integer promotions are performed on each of the operands.

That means that before shifting your operands are expanded to int. so there is no way that shifting by 8 does lead to an overflow by shifting bits into the sign bit (assuming sizeof(int)>16).

vlad_tepesch
  • 6,681
  • 1
  • 38
  • 80