unexpected byte order after casting pointer-to-char into pointer-to-int

Question

unsigned char tab[4] = 14;

If I print as individual bytes...

printf("tab[1] : %u\n", tab[0]); // output: 0
printf("tab[2] : %u\n", tab[1]); // output: 0
printf("tab[3] : %u\n", tab[2]); // output: 0
printf("tab[4] : %u\n", tab[3]); // output: 14

If I print as an integer...

unsigned int *fourbyte;
fourbyte = *((unsigned int *)tab);
printf("fourbyte : %u\n", fourbyte); // output: 234881024

My output in binary is : 00001110 00000000 00000000 00000000, which is the data I wanted but in this order tab[3] tab[2] tab[1] tab[0]. Any explanation of that, why the unsigned int pointer points to the last byte instead of the first ?

This code invokes undefined behaviour.in multiple places Also learn about the effective type (aka "strict aliasing") rule. — too honest for this site, Sep 19 '18 at 13:44
UB concerns aside, is this a question about big-endian vs little-endian storage? — Tim Randall, Sep 19 '18 at 13:57
well.... `fourbyte = *((unsigned int *)tab);` ==> `fourbyte = (unsigned int *)tab;` and `printf("fourbyte : %u\n", fourbyte);` == > `printf("fourbyte : %u\n", *fourbyte);` but there are still other problems with your approach — Support Ukraine, Sep 19 '18 at 14:37
Quote: "why the unsigned int pointer point on the last byte instead of the first" Read about endianess (yours are apperently little endian) BTW: There is no way the posted code can produce the said output. Please post the correct code — Support Ukraine, Sep 19 '18 at 14:44
yep i read about the endianess, that explain why this could happen, and that was more a problem of understanding this behavior, so that was just an example, but I change my way to do it because of the strict aliasing problem ! — abt jeremie, Sep 19 '18 at 15:10
In such investigative code, it is more informative to use non-decimal output such as `printf("fourbyte : %X\n", fourbyte);` (X not u). 234881024 is 0x0E000000 — chux - Reinstate Monica, Sep 19 '18 at 16:21

score 1 · Answer 1 · answered Sep 19 '18 at 18:25

The correct answer here is that you should not have expected any relationship, order or otherwise. Except for unions, the C standard does not define a linear address space in which objects of different types can overlap. It is the case on many architecture/compiler-tool-chain combinations that these coincidences can occur from time to time, but you should never rely on them. The fact that by casting a pointer to a suitable scalar type yields a number comparable to others of the same type, in no-way implies that number is any particular memory address.

So:

int* p;
int z = 3;
int* pz = &z;
size_t cookie = (size_t)pz;
p = (int*)cookie;
printf("%d", *p); // Prints 3.

Works because the standard says it must work when cookie is derived from the same type of pointer that it is being converted to. Converting to any other type is undefined behavior. Pointers do not represent memory, they reference 'storage' in the abstract. They are merely references to objects or NULL, and the standard defines how pointers to the same object must behave and how they can be converted to scalar values and back again.

Given:

char array[5] = "five";

The standard says that &(array[0]) < &(array[1]) and that (&(array[0])) + 1) == &(array[1]), but it is mute on how elements in array are ordered in memory. The compiler writers are free to use whatever machine codes and memory layouts that they deem are appropriate for the target architecture.

In the case of unions, which provides for some overlap of objects in storage, the standard only says that each of its fields must be suitably aligned for their types, but just about everything else about them is implementation defined. The key clause is 6.2.6.1 p7:

When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.

The gist of all of this is that the C standard defines an abstract machine. The compiler generates an architecture specific simulation of that machine based on your code. You cannot understand the C abstract machine through simple empirical means because implementation details bleed into your data set. You must limit your observations to those that are relevant to the abstraction. Therefore, avoid undefined behavior and be very aware of implementation defined behaviors.

Tim Randall · Answer 2 · 2018-09-19T15:58:28.923

0

Your example code is running on a computer that is Little-Endian. This term means that the "first byte" of an integer contains the least significant bits. By contrast, a Big-Endian computer stores the most significant bits in the first byte.

Edited to add: the way that you've demonstrated this is decidedly unsafe, as it relies upon undefined behavior to get "direct access" to the memory. There is a safer demonstration here

edited Sep 19 '18 at 15:58

answered Sep 19 '18 at 15:53

Tim Randall

4,040
1
17
39

1

I agree that endianess likely explains the results in this case. It should still be emphatically pointed out that when it comes to UB, you can't count on any result representing anything. The compiler is free to format your hard drive or deliver a much deserved high voltage potential between the coders seat and ground. – jwdonahue Sep 19 '18 at 15:58
There is no portable means to determine endianess of the underlying architecture in C. Given `uint_8 array[sizeof(uint32_t)] = {0xa5}; uint_8 *a = array; uint32_t *b = (uint_8*)array;`, it is entirely possible that even if `(size_t)a == (size_t)b`, they could be pointing at entirely different ranges of memory. Nothing in the language spec says that any scalar value must have the same alignment as any other, much less the alignment of an array of type and any other type. In fact, int types and char types could be located in different virtual memory blocks. – jwdonahue Sep 19 '18 at 16:52

unexpected byte order after casting pointer-to-char into pointer-to-int

2 Answers2