9

I have a 16 bits unsigned variable. I need to split it in 8 bits chunks.

Is doing the following enough:

chunk_lsb = (uint8)variable;
chunk_msb = (uint8)(variable >> 8);

Or should I use a mask:

chunk_lsb = (uint8)(variable & 0xFFu);
chunk_msb = (uint8)((variable >> 8) & 0xFFu);

I know that both approaches work, I'm just looking for the best way to do it, if there is one. Maybe there's none and just use the cast to reduce calculations is the best way? What do you guys think?

Soyding Mete
  • 137
  • 8
  • I think both will generate same binary code. As the first solution is more readable, I would use this one. You just need to be sure that uint8 is 8-but on all platforms (i guess it is) – jaudo Feb 25 '19 at 12:47
  • Thank you @jaudo, that's also the direction I'm taking. – Soyding Mete Feb 25 '19 at 12:54
  • 11
    Sidenote: Standard `stdint.h` types such as `uint8_t` should be preferred instead of creating your own. – user694733 Feb 25 '19 at 13:01

4 Answers4

7

It isn't clear what type variable is. Without that specified, we can only speculate.

But in general, you should avoid bit shifting on signed integer types, as that leads to various forms of poorly-defined behavior. This in turn means that you have to be careful with small integer types too, because they get promoted to signed int. See Implicit type promotion rules.

The specific case of (uint8)((variable >> 8) & 0xFFu); is safe if variable is unsigned. Otherwise it is unsafe, since right-shifting a negative value leads to implementation-defined behavior (arithmetic or logical shift).

variable << 8 will invoke undefined behavior on 16 bit systems in case variable is a small integer type, or an int16_t.

The safest, most portable way no matter left/right shift is therefore this:

chunk_lsb = variable;
chunk_msb = ((unsigned int)variable >> 8);

Though you might want to be overly explicit in order to silence all compiler warnings:

chunk_lsb = (uint8_t) (variable & 0xFFu);
chunk_msb = (uint8_t) ( (unsigned int)variable>>8 & 0xFFu );
Lundin
  • 195,001
  • 40
  • 254
  • 396
  • AFAICT `(variable >> 8) & 0xFFu` will give the same result regardless of whether the shift is arithmetic or logical, as long as `int` is at least 16 bits wide (which I believe the standard guarantees it to be). – Ilmari Karonen Feb 25 '19 at 15:37
  • 1
    @IlmariKaronen The size of the variable doesn't matter for that, only the signedness. In case the value is negative, you get implementation-defined behavior. One example of such broken code would be `unsigned char u = 0; ... ~u >> n;` – Lundin Feb 25 '19 at 15:41
  • I agree that it's unpredictable, insofar as the standard doesn't require the implementation-defined result of right-shifting a negative signed integer to be _either_ an arithmetic _or_ a logical shift. However, on implementations where it _is_ one of those two, shifting right by _n_ bits and then masking off (at least) the _n_ highest bits should produce the same result regardless of which kind of shift it is. Thus, your parenthetical remark in the third paragraph of your answer seems a bit misleading. – Ilmari Karonen Feb 25 '19 at 16:18
  • @Lundin, it is perhaps worthwhile to note that your first set of recommendations assumes that `chunk_lsb` and `chunk_msb` themselves have type equivalent to `uint8_t`. That isn't clear from the question -- the OP's "8 bits chunks" could be stored in variables having a wider type, or a signed one. – John Bollinger Feb 25 '19 at 16:29
  • @IlmariKaronen: If the authors of the Standard had required either zero-fill or sign-extension behavior, but there had been some platform where some other behavior would have been more useful, mandating zero-fill or sign-extension behavior could have made C less useful on that platform. The only situations where it should better whether the Standard mandates those two particular behaviors would be if such a platform actually exists, or a compiler writer feels like being deliberately weird. The former case would be best accommodated by *not* mandating the behavior... – supercat Feb 25 '19 at 17:38
  • ...and the latter case couldn't be usefully accommodated at all, so there was no perceived need to spend time on the issue. – supercat Feb 25 '19 at 17:40
  • 1
    Disagree about using `(uint8_t)` cast **and** the mask in `chunk_lsb = (uint8_t) (variable & 0xFFu);`. The cast is always sufficient to silence warnings. The mask is better programming. But both is [WET](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself#DRY_vs_WET_solutions). Aside from that I really like the answer. – chux - Reinstate Monica Feb 26 '19 at 02:07
  • @Lundin, John Bollinger: I actually did precise that they are **unsigned** variables, and in that particular case _chunk_lsb_ and _chunk_msb_ are uint8 (uint8_t, hence the cast) sorry if that was unclear – Soyding Mete Feb 26 '19 at 08:06
4

Since uint8 is unsigned, you don't have to do the masking:

6.3.1.3 Signed and unsigned integers

  1. When a value with integer type is converted to another integer type other than _ Bool , if the value can be represented by the new type, it is unchanged.
  2. Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type. 60)
  3. Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.

However, most likely both will result in the same compiler output. I usually add the mask because it makes clear what code is supposed to do, and makes the cast unnecessary.

user694733
  • 15,208
  • 2
  • 42
  • 68
  • 1
    Except, C has nasty integer promotion, so if you try to do the same thing and left shift, the program may implode in undefined behavior. In this particular case we get away since it is right shift of a positive number. – Lundin Feb 25 '19 at 13:04
  • Also, the cast makes the loss of precision clearly intentional. Many compilers can emit warning on loss of precision, which can extremely important. (I'm thinking of noobs who are so *certain* that "`size_t` and pointers are really just `unsigned int`...) – Andrew Henle Feb 25 '19 at 13:08
  • @Lundin: Under C89, signed left-shift behavior was defined for all possible left-operand values on implementations whose integer types don't have padding bits or trap representations, but trapping might have been more sensible in situations where the mandated behavior would differ from that of power-of-two multiplication. I've seen no evidence to suggest any intention to change the handling of situations where C89's behavior and power-of-two multiplication would be equivalent. – supercat Feb 25 '19 at 17:44
  • @supercat It's bad enough to accidentally shift data into the sign bit, regardless of UB and signedness format. – Lundin Feb 25 '19 at 19:16
  • @Lundin: In two's-complement notation, the sign bit is simply shorthand for "the state of this bit and an infinite number of bits to the left". The value -3 is an infinite number of ones followed by 01. It may be more convenient to think of a 32-bit two's-complement value as being composed of an infinite number of 0's or 1's followed by exactly 31 more bits, but an infinite number of 1's, followed by 29 more, and then 01, is the same as an infinite number of 1's followed by 01. – supercat Feb 25 '19 at 19:42
  • @Lundin: The only time there would be any anomaly would be in cases where shifting the *entire number*, including the leading ones or zeroes, would yield something that isn't of the form "an infinite number of ones or zeroes, followed by 31 individual bits". I'm not sure why you would think there's something "accidental:" about shifting part of an infinite sequence of ones into a bit that represents an infinite sequence of ones – supercat Feb 25 '19 at 19:53
  • @supercat Because on a 16 bit machine, a whole lot of programmers believe that `uint8_t u8=0x80; ... u8 << 8;` would give the result 32768 and not -32768. And that `(u8<<8) >>8` would give 0x80 and not 0xFF80. This is a well-known source of bugs so there's no point in arguing. – Lundin Feb 26 '19 at 07:16
  • @Lundin: That scenario is one where the behavior would differ from power-of-two multiplication. What does that have to do with left-shifting negative numbers *in cases where the C89 behavior would be equivalent to power-of-two multiplication*? – supercat Feb 26 '19 at 07:47
3

Should a cast be used to truncate a long variable?

If chunk_lsb is an 8 bit object (narrower than variable), use cast or mask (not both). Useful in quieting pedantic warnings about range reduction. I prefer the mask - unless the compiler is picky.

uint8_t chunk_lsb = (uint8_t) variable;
// or 
uint8_t chunk_lsb = variable & 0xFFu;

Otherwise use the mask.

unsigned chunk_lsb = variable & 0xFFu;
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
-1

Maybe there's none and just use the cast to reduce calculations is the best way?

General speaking, the asm code will be the same, so in terms of speed, it does not matter which one you use:

What do you guys think?

IMO, the first one is clearer, in terms of readability, but I cannot figure out a coding standard or guideline which supports my preference. Anyway, I would use a const variable if your preference is the second one, to remove magic numbers and make clearer that the purpose is masking (assuming that you have chosen a correct name for the const variable).

Jose
  • 3,306
  • 1
  • 17
  • 22
  • 3
    *Anyway, I would use a `const` variable if your preference is the second one, to remove magic numbers and make clearer that the purpose is masking* Anyone who thinks `0xFF` in the context of bit masking is a "magic number" that has to be papered over with a variable is engaging in cargo-cult programming. What possible variable can be more clear than `0xFF` as a bit mask? `one_byte_bit_mask_0xFF`? – Andrew Henle Feb 25 '19 at 13:05
  • IMO, if the purpose is to mask, the word mask is clearer than any (magic) number. – Jose Feb 25 '19 at 13:44
  • *IMO, if the purpose is to mask, the word mask is clearer than any (magic) number* Oh? How many bits does this mask: `unsigned int result = input & mask;` And no, I didn't DV. – Andrew Henle Feb 25 '19 at 13:57
  • `0xFF` vs `one_byte_bit_mask_0xFF` vs `mask`, I choose the second one as the least-bad option. – Jose Feb 25 '19 at 14:04