2

i have to read out the bytes of a uint32_t variable, and i have seen this kind of implementation from a colleague of mine. My Question is, if the behaviour of that code-example is reliable on "nearly every" 32bit microcontroller. Does it supposable work on every 32bit Microcontroller or is it platform-specific behaviour i am relying on? P.S.: the endianness of the system shall not be considered in this example.

uint8_t     byte0=0;
uint8_t     byte1=0;
uint8_t     byte2=0;
uint8_t     byte3=0;
uint8_t     *byte_pointer;  //byte_pointer
uint32_t    *bridge_pointer;//pointer_bridge between 32bit and 8 bit variable
uint32_t    var=0x00010203;

bridge_pointer=&var;    //bridge_pointer point to var
byte_pointer=(uint8_t *)(bridge_pointer);   //let the byte_pointer point to bridge_pointer

byte0=*(byte_pointer+0);    //saves byte 0
byte1=*(byte_pointer+1);    //saves byte 1
byte2=*(byte_pointer+2);    //saves byte 2
byte3=*(byte_pointer+3);    //saves byte 3

Thanks in Advance

chhegema
  • 23
  • 3
  • 1
    `byte0 = byte_pointer[0]` etc. would be more elegant (and is equivalent to `*(byte_pointer + 0)`). Also I don't think the `bridge_pointer` is strictly necessary, you could cast `&var` to `uint8_t *` immediately. – Kninnug Sep 23 '14 at 12:10
  • 1
    You might want to read about [aliasing](http://en.wikipedia.org/wiki/Aliasing_%28computing%29) and [pointer aliasing](http://en.wikipedia.org/wiki/Pointer_aliasing). – Some programmer dude Sep 23 '14 at 12:11

3 Answers3

4

you should declare byte_pointer as unsigned char*, then your example will work if you accept different outputs on little endian. Here is a solution, which does not depend on endianess

uint8_t byte0 = var;
uint8_t byte1 = var>>8;
uint8_t byte2 = var>>16;
uint8_t byte3 = var>>24;

byte0 will be the LSB

mch
  • 9,424
  • 2
  • 28
  • 42
3
byte0=*(byte_pointer+0);    //saves byte 0

This line (and the following ones) are a violation of strict-aliasing. An object declared as uint32_t is accessed through an lvalue of type uint8_t; unsigned char should be used instead of uint8_t, as lvalues of a character type are allowed to access objects of a different type (if uint8_t exists, it behaves the same as unsigned char despite the more relaxed aliasing rules).

unsigned char *byte_pointer = (unsigned char *)(bridge_pointer);
uint8_t byte0 = *(byte_pointer+0);
    // byte0 can still be uin8_t, the access to var is important for aliasing

As mentioned in a comment, byte_pointer[0] is equivalent to *(byte_pointer+0) and is more common.

With this change, the code has well-defined behaviour. (And is portable to implementations having uint32_t and uint8_t, although endianness may lead to different results, as noted in the question.)

The relevant standard parts for strict aliasing are 6.5 p6/7.

Community
  • 1
  • 1
mafso
  • 5,433
  • 2
  • 19
  • 40
  • Thank you for your answer, it helped me very well. The other answers were useful too, thanks at all. – chhegema Sep 23 '14 at 12:55
  • In practice, `uint8_t` will behave exactly as unsigned char and will therefore not break strict aliasing. [See this](http://stackoverflow.com/questions/12666146/can-uint8-t-be-a-non-character-type). – Lundin Sep 23 '14 at 13:50
  • @Lundin: IIRC the gcc mailing list discussed that some time ago and it was considered a bug in gcc, that it doesn't use that aliasing information and would likely be subject to change. I'm not sure about the current status. I'll link when I've found it. – mafso Sep 23 '14 at 14:11
  • @Lundin: Sorry, I can't find it anymore. It's mentioned [here](http://stackoverflow.com/questions/16138237/when-is-uint8-t-%E2%89%A0-unsigned-char) in the comments to R's answer. Maybe I find it during the next days. That aside, I really think, `unsigned char` should be used here. This UB may be helpful for optimizations and we should assume it will be used. (Reasoning along "this works in reality, so just don't care" led to broken code when Gcc used UB of signed-integer overflow, aliasing analysis at all, dereferencing `NULL` pointers, ...) – mafso Sep 23 '14 at 16:14
  • @Lundin: It's mentioned in the mailing lists for example [here](https://gcc.gnu.org/ml/gcc/2000-07/msg00147.html) and [here](https://gcc.gnu.org/ml/gcc/2000-05/msg01106.html). Couldn't find a better (and newer) reference. And as I said, I don't know about the current status. But the standard is clear and there are optimization opportunities, so some day some compiler will probably make use of it. – mafso Sep 24 '14 at 10:34
  • @mafso They seem to have a concern that `char` might not be 8 bits wide, but something else. While `uint8_t` is always exactly 8 bits. The solution then is to add a static assert, to ensure that uint8_t has the same implementation as unsigned char. Again, I very much doubt this will ever cause problems in the real world. – Lundin Sep 24 '14 at 11:29
  • @Lundin If `CHAR_BIT` is more than 8, `uint8_t` must not exist (as it must be addressable and `char` is the smallest addressable type). So this case is easy (just don't provide the type). And to me, this sounds like gcc would already assume that `int8_t` doesn't alias if there weren't some problems implementing it. – mafso Sep 24 '14 at 12:18
1

In practice, the code is portable except for the endianess issue. To access a part of a uint32_t through a uint8_t pointer will always work, in the real world outside the standard.

Whether uint8_t is considered a character type or not is debated, but that discussion is only of academical interest. (If it is to be considered as a character type, it will not the break aliasing rule in the standard 6.5/7.) In practice, uint32_t will not contain any padding bits or other such theoretical nonsense that the standard allows.

To avoid endianess problems, I would suggest re-writing the code to use bit shifts, as demonastrated in @mch's answer.

Community
  • 1
  • 1
Lundin
  • 195,001
  • 40
  • 254
  • 396
  • "[...] padding bits or other such theoretical nonsense that the standard allows."--It doesn't. Cf. C11 (n1520) 7.20.1.1 p1/2. The `(u)intN_t` types behave quite "normal": two's complement and no padding bits. – mafso Sep 23 '14 at 16:17
  • @mafso So there is no reason why uint8_t wouldn't work. – Lundin Sep 24 '14 at 06:59
  • @Lundin: The Standard would allow an implementation to use any of 32! (i.e. about 2.63E+35) ways of mapping the bits of a `uint32_t` to the bits of four consecutive uint8_t values. In practice, two mappings are far more common than any others, and there are probably at most two more that appear in *any* non-contrived implementations. – supercat Oct 17 '17 at 22:46