2

Following example:

void foo(void)
{
    uint8* ptr_data8;        
    uint32* ptr_data32;
    uint32 data32 = 255;

    ptr_data32 = &data32;

    ptr_data8 = (uint8*)ptr_data32;
}

So depending on the Endianess the memory may look different:

Little-Endian:

Address:   [  0|  1|   2|   3]
           -------------------
Value:     [255|  0|   0|   0]

Big-Endian:

Address:   [  0|  1|   2|   3]
           -------------------
Value:     [  0|  0|   0| 255]

So the question is, to which address do the pointers point to for each architecture?

Do the pointers point to the lowest address of the whole data element?

[Little Endian]
ptr_data8  --> 0
ptr_data32 --> 0

[Big Endian]
ptr_data8  --> 0
ptr_data32 --> 0

Or do they point to the lowest value/byte of the data element?

[Little Endian]
ptr_data8  --> 0
ptr_data32 --> 0

[Big Endian]
ptr_data8  --> 0
ptr_data32 --> 3

Also, is the address where the pointers point to platform/compiler/architecture dependent and is there a definition for this behaviour somewhere?

Toby
  • 3,815
  • 14
  • 51
  • 67

2 Answers2

4

Your guess is not possible to prove or disprove, because the standard makes no requirement for the pointers to point to a certain numeric location.

The standard requires your uint32_t* pointer to be convertible to void*, which has the same representation as char* (and by extension, uint8_t*) pointers. The compiler must be able to "round-trip" the pointer like this:

uint32_t *ptr32orig = ... // Assign some valid value
void *tmp1 = (void*)ptr32orig;
char *cptr = (char*)tmp1;
// cptr has the same representation as tmp1
void *tmp2 = (void*)cptr;
// At this point, tmp1 must be equal to tmp2
uint32_t *ptr32back = (uint32_t*)tmp2;
// At this point ptr32back must be equal to ptr32orig

This seems to imply that cptr must point to the same location as ptr32orig, but that's not right: the compiler is allowed to do whatever "magic" it wants on converting of ptr32orig to tmp1, and then undo its effects on converting tmp2 back to uint32_t*.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • The C language does however allow lvalue access of the data that `ptr32orig` points as through an lvalue expression of character type (such as uint8_t). If you take `*cptr` then you must access the data pointed at by `ptr32orig`, in some implementation-defined manner. This _does_ imply that `cptr` must point at the same location as `ptr32orig`. – Lundin Nov 08 '17 at 12:33
  • Note that none of the cast in this exemple are needed. – Stargateur Nov 08 '17 at 12:34
  • 1
    Doesn't your answer contradict your older answer here: https://stackoverflow.com/a/35496066/1187415 ? *"Converting an `int*` to `char*`, therefore, is allowed, and it is also portable. The pointer would be pointing to the initial byte of your `int`'s internal representation."* – Martin R Nov 08 '17 at 12:34
  • 2
    @MartinR No, the *value* of a pointer is not mandatory to be the same after the cast, but it's must "pointing to" the same byte. So only when you dereference it. – Stargateur Nov 08 '17 at 12:43
  • 1
    @MartinR There is no contradiction. Here is an example: let's say I have `uint32_t *ptr` that is equal to `0xC003`, with bytes of `uint32_t` pointed to by this pointer located in bytes `0xC000..0xC003`. The standard requires that when I cast `ptr` to `char*` I get back `0xC000`. It does not require the cast to be a no-op, so the compiler is allowed to do some "pointer translation". In the interests of full disclosure, I've never seen a platform with such crazy behavior, but I don't think the standard prohibits that. – Sergey Kalinichenko Nov 08 '17 at 12:58
0

Is my guess correct? If not, whats right?

Yes, the pointer will point at the first address no matter endianess. The content stored on that address will vary depending on endianess.


Is the address where the pointers point to platform/compiler/architecture dependent?

No.

(Except C does not place any restrictions on how a pointer is represented in practice)


Is the behaviour defined somewhere?

Yes. The rules for pointer conversions (C11 6.3.2.3) state:

When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object.

Furthermore, the rules for effective type and strict aliasing (C11 6.5) allows you to access data of another type through a pointer to a character type such as uint8_t.

You are not allowed to do the other way around though - if you have an array of uint8_t, you are not allowed to point at the first element of that array with a uint32_t* and then access the contents. Doing so would be a "strict aliasing violation".

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • Are you sure about strict aliasing requiring pointer equality? Couldn't the compiler give you a numerically different pointer when you convert `uint32_t*` to `uint8_t*` to enable byte-by-byte access to proceed correctly? – Sergey Kalinichenko Nov 08 '17 at 12:31
  • uint8_t isn't required to be a typedef to a character type (6.2.5) – Jon Chesterfield Nov 08 '17 at 12:34
  • @dasblinkenlight The requirement doesn't actually come from strict aliasing but from the rules of pointer conversions, C11 6.3.2.3: "When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object." Let me add this text to the answer. – Lundin Nov 08 '17 at 12:43
  • Great answer and thanks for the reference to the C-Standard. Is it also defined somewhere, that a pointer (without conversion) points to the lowest address of an object? – Toby Nov 08 '17 at 12:47
  • @JonChesterfield The stdint.h types map to the primitive data types. For `uint8_t` to be meaningful in real-world applications, it must always be a character type. A compiler treating it as something else would be useless and broken. Try it yourself on any real world compiler: `_Generic((uint8_t){0}, unsigned char: puts("I am a character type"));` – Lundin Nov 08 '17 at 12:49
  • @Stargateur No, but it must be a character type in order to make stdint.h usable in real-world applications, see comment above. – Lundin Nov 08 '17 at 12:50
  • @Lundin `uint8_t` is *generally* a `char`, but we are talking about strict C standard and it doesn't define `uint8_t` as character type (AFAIK), `char` size could be greater that 8 bit on some implementation. Your exemple with generic will just point out the implementation of `uint8_t` but this not prouve that standard say that `uint8_t` is a character type. – Stargateur Nov 08 '17 at 12:54
  • 2
    @Stargateur Such an implementation would not support `uint8_t`, so that specific case is irrelevant. In addition, the only real-world systems that actually have a `char` type wider than 8 bits are various dysfunctional, mostly obsolete DSPs, most of them from TI. Anyone designing for C source compatibility with those DSPs, are seriously wasting their time. Anyone writing programs for those DSPs in C and not in assembler is most likely confused. Instead, design your C programs for compatibility with real-world computers. – Lundin Nov 08 '17 at 12:59
  • @Lundin There's a fairly long track record of compilers doing inconvenient things in the name of optimisation. Some discussion from gcc https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66110 , llvm https://bugs.llvm.org/show_bug.cgi?id=31410. Both currently treat uint8_t as a character. – Jon Chesterfield Nov 08 '17 at 13:03
  • @Lundin If you just use joker card "real-world", I will stop try to explain you, but what you said is in conflict with the first spirit of C and the desire of the committee. C try to leave free implementation to be very portable. Maybe you don't care about these 1% of strange implementation but they exist. – Stargateur Nov 08 '17 at 13:04
  • @Stargateur The "first spirit of C" was unfortunately one of the dumber things in computer history. They assumed that machines that did not use 2's complement would exist. They assumed that machines would have mysterious representations of pointers that don't correspond to physical addresses. They assumed that one byte is not necessarily 8 bits. And so on. History has proven that all of this was pure and utter nonsense. All it is good for is endless language-lawyer discussions like this one, that have been going on since the 1970s. This isn't productive. Why we should focus on the real world. – Lundin Nov 08 '17 at 13:34
  • They didn't need to assume. It was just a **plain fact**. – Antti Haapala -- Слава Україні Nov 08 '17 at 14:15
  • @AnttiHaapala Citation needed. Try to do research on how many one's complement computers that have existed in the real world. I once did this out of curiosity and only found one, some highly experimental supercomputer in the 1960s. So at the point when C was designed in the 1970s, only 1 or 2 such computers had ever existed. 20 years later when C was standardized, the amount of one's complement computers that had ever existed remained the same. None of them were in use. If any signed magnitude computers have ever existed, I do not know. Yet the C90 committee insisted on supporting them. – Lundin Nov 08 '17 at 14:26
  • One's complement?! https://public.support.unisys.com/framework/publicterms.aspx?returnurl=%2f2200%2fdocs%2fcp14.0%2fpdf%2f78310422-011.pdf – Antti Haapala -- Слава Україні Nov 08 '17 at 14:37
  • "UCS C represents an integer in 36-bit ones complement form (or 72-bit ones complement form, if the long long type attribute is specified). Unless the CONFORMANCE/ TWOSARITH keyword is used, there is no representation change when converting a signed int value to unsigned int or when converting an unsigned int value to signed int. For more information on the CONFORMANCE keyword, see the C Compiler Programming Reference Manual Volume 2." – Antti Haapala -- Слава Україні Nov 08 '17 at 14:38
  • And that document is from *this* decade. – Antti Haapala -- Слава Україні Nov 08 '17 at 14:39
  • @AnttiHaapala Well that sounds like a lovely system to work with. I'll adjust all my existing C programs for compatibility right away. – Lundin Nov 08 '17 at 14:42
  • Down voters care to explain what is incorrect in the answer? – Lundin Nov 08 '17 at 14:42
  • @Lundin I already tell it, `uint8_t` **is not a character type**, the only character type are "char, signed char and unsigned char". – Stargateur Nov 08 '17 at 16:30