6

I came across this piece of C code:

typedef int gint
// ...
gint a, b;
// ...
a = (b << 16) >> 16;

For ease of notation let's assume that b = 0x11223344 at this point. As far as I can see it does the following:

  • b << 16 will give 0x33440000
  • >> 16 will give 0x00003344

So, the 16 highest bits are discarded.

Why would anyone write (b << 16) >> 16 if b & 0x0000ffff would work as well? Isn't the latter form more understandable? Is there any reason to use bitshifts in a case like this? Is there any edge-case where the two could not be the same?

  • What has C to do with "instructions"? – Kerrek SB Aug 12 '15 at 12:14
  • 1
    @KerrekSB at first a thought a not very optimal compiler might use more instructions for the bitshifts, but after reconsidering, that should not be the case. Thanks. –  Aug 12 '15 at 12:16
  • Maybe the programmer wanted the code to be less readable! That's the only thing I can think of! – CinCout Aug 12 '15 at 12:16
  • This code shaves off the 16 highest bits of `b`, whatever its width is or becomes. Not sure why it would be useful, but yeah. – Quentin Aug 12 '15 at 12:18
  • @Quentin "whatever its width" - that's correct. The `uint32_t` actually comes from a typedef. So, if another type was there (say `uint64_t`), it would still work. Could that be it? –  Aug 12 '15 at 12:20
  • May you tell us where you saw it? – edmz Aug 12 '15 at 12:20
  • @CamilStaps it's the only thing I can think of, but I don't see how that can be useful. – Quentin Aug 12 '15 at 12:22
  • @black yeah, but it's not really interesting. It's a large project with few users. Anyway, you could head over to http://clean.cs.ru.nl/Download_Clean and download the Linux Complete Sources. In that tarball, navigate to `/src/libraries/ObjectIO/ObjectIO/OS Linux/Linux_C_12`. Then it's `cCrossCallWindows_121.c`, line 641. I now noticed that the `uint32_t` already was edited, originally, it was `int` (it is a `gint`). –  Aug 12 '15 at 12:23
  • 1
    The behaviour can change depending on whether it's signed or unsigned. For `uint32_t`, this is identical to the `&` mask. For `int`, you might get sign extension on the right shift. What is `gint`? – Useless Aug 12 '15 at 12:25
  • @Useless it is from `qtypes.h` from the GLIB library: `typedef int gint`. This is part of a GTK+ app. –  Aug 12 '15 at 12:26
  • 3
    Try it with -1, or 0x8000, and note that & is *not* the same. Presumably taking advantage of the undefined sign extension behavior was intentional. – Hans Passant Aug 12 '15 at 12:26
  • 5
    That last change is pretty radical from unit32_t to int, that is pretty big mistake to make and now that you have answers it makes a difference. I wouldn't add an answer to this question until the OP clarifies what the deal is. – Shafik Yaghmour Aug 12 '15 at 12:29
  • 1
    [Right shifting negative numbers in C](http://stackoverflow.com/q/1857928/995714) – phuclv Aug 12 '15 at 12:44

3 Answers3

3

Assuming that the size of int is 32 bits, then there is no need to use shifts. Indeed, bitwise & with a mask would be more readable, more portable and safer.

It should be noted that left-shifting on negative signed integers invokes undefined behavior, and that left-shifting things into the sign bits of a signed integer could also invoke undefined behavior. C11 6.5.7 (emphasis mine):

The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 × 2E2, reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.


(The only possible rationale I can think of, is some pre-mature optimization for a 16-bit CPU that comes with a poor compiler. Then the code would be more efficient if you broke up the arithmetic in 16 bit chunks. But on such a system, int would most likely be 16 bits, so the code wouldn't make any sense then.)

As a side note, it doesn't make any sense to use the signed int type either. The most correct and safe type for this code would have been uint32_t.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • 1
    I like this answer for throwing the book at OP while they change their question to the dismay of the answerers, but I think you should criticise the assumption that `int` is 32-bits, certainly not "portable". Nevertheless, the `&` as it is is preferable to the shifts to avoid undefined behaviour. – Veltas Aug 12 '15 at 12:42
  • The signed left shift is undefined _by the standard_. Since part of the point of UB is to allow implementation leeway, it can still be defined _by the platform_. In practice, gcc has [well-defined behaviour including sign extension](https://gcc.gnu.org/ml/gcc/2000-04/msg00152.html) – Useless Aug 12 '15 at 12:47
  • Another rationale you might use is that for an unsigned left operand the upper 16-bits are cleared regardless of the size of the operand. I can't think of a usecase for this immediately where you might change the size of the operand. Either way I'm deleting my own answer, since it's redundant after OP's edits. – Veltas Aug 12 '15 at 13:06
1

For an unsigned integral type (eg, the uint32_t we first thought was being used),

(b << 16) >> 16

is identical to b & (1<<16 - 1).

For a signed integral type though,

(b << 16)

could become negative (ie, the low int16_t would have been considered negative when taken on its own), in which case

(b << 16) >> 16

will (probably) still be negative due to sign extension. In that case, it isn't the same as the & mask, because the top bits will be set instead of zero.


Either this behaviour is deliberate (in which case the commented-out typedef is misleading), or it's a bug. I can't tell without reading the code.

Oh, and the shift behaviour in both directions is how I'd expect gcc to behave on x86, but I can't comment on how portable it is outside that. The left-shift may be UB as Lundin points out, and sign extension on the right-shift is implementation defined.

Useless
  • 64,155
  • 6
  • 88
  • 132
1

So, the 16 highest bits are discarded.

They are not. Though it is formally implementation-defined how right-shift operation is performed on signed types, most compilers do it so as to replicate the sign bit.

Thus, the 16 highest bits are filled by the replicated value of the 15th bit as the result of this expression.

ach
  • 2,314
  • 1
  • 13
  • 23
  • Right-shift on signed types with non-negative values is defined in the spec, actually. – davmac Aug 12 '15 at 12:40
  • It is only implementation-defined in case the type is both signed and negative, which isn't the case here. In order to get there with this code, you would first have to predict the outcome of the undefined behavior of left shifting data into the sign bits. – Lundin Aug 12 '15 at 12:40
  • @Lundin: The question does not outrule negative results for right-shifting. – Veltas Aug 12 '15 at 12:46
  • @davmac Yeah, the correct way of putting it would be thus: *formally, the operation of right shift is only partly defined on domains of signed integral types; thus, the full definition depends on implementation*. – ach Aug 12 '15 at 12:48
  • @Lundin, I cannot find any wording in the Standard as to left-shifting data into the sign bit causing undefined behavior. Could you provide some substantiation? – ach Aug 12 '15 at 12:50
  • @AndreyChernyakhovskiy See the quoted part in my answer. "If E1 has a signed type and nonnegative value, and E1 × 2E2 is **representable** in the result type..." In the case of for example `INT_MAX << 1`, the result is (arguably) not representable in an `int` and therefore undefined behavior. – Lundin Aug 12 '15 at 12:54
  • @AndreyChernyakhovskiy actually the behavior is _undefined_ by the standard if the value is negative. Thus "the full definition depends on the implementation" would be true only if the implementation makes additional guarantees beyond what the standard requires. While many implementations may produce a seemingly predictable result for such a calculation it would be unwise to rely on this. – davmac Aug 12 '15 at 13:00
  • @Lundin, Curious, I looked up the C++ Standard and it has no such clause. Had no idea C differs here. Apparently, yet another unaccountable tiny discrepancy between the two languages. – ach Aug 12 '15 at 13:00
  • @AndreyChernyakhovskiy Curious. If the standard doesn't mention it at all, that's also undefined behavior, just harder to spot. It would seem that C++03 has identical text as C90, while C99 and C11 explicitly lists this as UB. This has apparently been corrected in C++11, which has the same text as C11. – Lundin Aug 12 '15 at 13:23