13

In these comments user @Deduplicator insists that the strict aliasing rule permits access through an incompatible type if either of the aliased or the aliasing pointer is a pointer-to-character type (qualified or unqualified, signed or unsigned char *). So, his assertion is basically that both

long long foo;
char *p = (char *)&foo;
*p; // just in order to dereference 'p'

and

char foo[sizeof(long long)];
long long *p = (long long *)&foo[0];
*p; // just in order to dereference 'p'

are conforming and have defined behavior.

In my read, however, it is only the first form that is valid, that is, when the aliasing pointer is a pointer-to-char; however, one can't do that in the other direction, i. e. when the aliasing pointer points to an incompatible type (other than a character type), the aliased pointer being a char *.

So, the second snippet above would have undefined behavior.

What's the case? Is this correct? For the record, I have already read this question and answer, and there the accepted answer explicitly states that

The rules allow an exception for char *. It's always assumed that char * aliases other types. However this won't work the other way, there's no assumption that your struct aliases a buffer of chars.

(emphasis mine)

Community
  • 1
  • 1
  • 7
    Indeed, only the first version is allowed by the standard. – Oliver Charlesworth Jul 06 '14 at 17:19
  • 4
    @OliCharlesworth Well, I think so too. Tell that to Deduplicator, who (quite arrogantly) asserts that "I need to learn to read the standard better"... >. – The Paramagnetic Croissant Jul 06 '14 at 17:20
  • 3
    FWIW the relevant C99 standard clause is 6.5 p7. – Oliver Charlesworth Jul 06 '14 at 17:26
  • Please add an `_Alignas(long long)` to the char array, otherwise mis-alignment might cause UB. – Deduplicator Jul 06 '14 at 17:48
  • @Kerrek: How would reading characters from a file into a buffer and then using the buffer as a `struct whatever` (depending on the initial sequence) work then? – Deduplicator Jul 06 '14 at 17:52
  • 2
    @Deduplicator: If you mean something like `char buf[...]; fread(buf, ...); foo(((MyStruct *)buf)->member);`, it doesn't work. – Oliver Charlesworth Jul 06 '14 at 17:54
  • @Oli: You forgot the alignment specifier. With it, that would work. Also, how would you code an allocator then? – Deduplicator Jul 06 '14 at 18:08
  • 2
    @Deduplicator: Alignment is another matter. See the standard quote I referenced above for why this isn't valid from an aliasing POV. – Oliver Charlesworth Jul 06 '14 at 18:18
  • @Oli: Added all those quotes from the standard I think might be handy for prooving your point, and an example usage which is correct (imho) as an answer. Please proove it wrong. – Deduplicator Jul 06 '14 at 18:21
  • 1
    @Deduplicator but it is not just about alignment, it is [about optimization](http://stackoverflow.com/questions/20922609/why-does-optimisation-kill-this-function) as well and what assumptions the compiler is allowed to make. I don't see how any type can be the effective type of `char` ... whereas `char` has a special exception carved out for it. – Shafik Yaghmour Jul 06 '14 at 18:22
  • 2
    @Deduplicator: Either don't read into a buffer, but into an actually existing type (this only works for fundamental types), or memcopy individual struct elements afterwards one by one. – Kerrek SB Jul 06 '14 at 18:31
  • @KerrekSB: Seems I must take care not to use a buffer with a declared type, then I'm ok. – Deduplicator Jul 06 '14 at 18:51
  • In the end @OliCharlesworth found the neccessary additional quote to make me see the additional restrictions when not using dynamic allocation. Thanks a lot! – Deduplicator Jul 06 '14 at 18:58
  • The `long long *p = (long long *)&foo[0]; *p` example certainly has a potential for alignment problems. e.g. `foo` is on an odd address and `p` may need an even (or quad) address. But is this the "strict aliasing rule" issue? Thought that had to do with http://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule – chux - Reinstate Monica Jul 06 '14 at 21:40
  • @chux that is not the entire strict aliasing issue itself, but alignment is *part of* why the strict aliasing rule exists. It's not the only reason, though (it's about the compiler's ability to optimize based on assumed program invariants too). – The Paramagnetic Croissant Jul 06 '14 at 21:43
  • @chux aliasing and alignment are separate issues. If the access does not meet alignment requirements it is UB. If the access does meet alignment requirements then we consider other things such as aliasing. – M.M Jul 06 '14 at 23:33
  • @Matt McNabb Agree aliasing and alignment are separate issues. Hence my comment as the title is "strict aliasing rule", but the examples appearer more about alignment. Certainly both must be OK for code to work, but I think they can be addresses separately in this post. – chux - Reinstate Monica Jul 06 '14 at 23:37
  • @chux it seems to me that the question is only about aliasing. (The actual code example may also have alignment problems, but the question is just asking about the aliasing issue). – M.M Jul 06 '14 at 23:39
  • @user3477950 note that the accepted answer is actually wrong on the [second linked thread](http://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule). The comment by RMartinhoFernandes sums it up. – M.M Jul 06 '14 at 23:45

2 Answers2

5

You are correct to say that this is not valid. As you yourself have quoted (so I shall not re-quote here) the guaranteed valid cast is only from any other type to char*.

The other form is indeed against standard and causes undefined behaviour. However as a little bonus let us discuss a little behind this standard.

Chars, on every significant architecture is the only type that allows completely unaligned access, this is due to the read byte instructions having to work on any byte, otherwise they would be all but useless. This means that an indirect read to a char will always be valid on every CPU I know of.

However the other way around this will not apply, you cannot read a uint64_t unless the pointer is aligned to 8 bytes on most arches.

However, there is a very common compiler extension allowing you to cast properly aligned pointers from char to other types and access them, however this is non-standard. Also note, if you cast a pointer to any type to a pointer to char and then cast it back the resultant pointer is guaranteed to be equal to the original object. Therefore this is ok:

struct x *mystruct = MakeAMyStruct();
char * foo = (char *)mystruct;
struct x *mystruct2 = (struct mystruct *)foo;

And mystruct2 will equal mystruct. This also guarantees the struct is properly aligned for it's needs.

So basically, if you want a pointer to char and a pointer to another type, always declare the pointer to the other type then cast to char. Or even better use a union, that is what they are basically for...

Note, there is a notable exception to the rule however. Some old implementations of malloc used to return a char*. This pointer is always guaranteed to be castable to any type successfully without breaking aliasing rules.

Vality
  • 6,577
  • 3
  • 27
  • 48
  • There is no aliasing problem with any pointer cast; the potential probably only arises when dereferencing the result of the cast and then using that expression to access memory. And the behaviour depends on the effective type of the memory, and the type of the dereferencing expression; it is completely independent of the types of any other pointers that might exist or have existed. So the case in your last paragraph actually does not need an exception. – M.M Jun 08 '17 at 23:19
2

Deduplicator is correct. The undefined behaviour that allows compilers to implement "strict aliasing" optimizations doesn't apply when character values are being used to produce a representation of an object.

Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined. Such a representation is called a trap representation.

However your second example has undefined behaviour because foo is uninitialized. If you initialize foo then it only has implementation defined behaviour. It depends on the implementation defined alignment requirements of long long and whether long long has any implementation defined pad bits.

Consider if you change your second example to this:

long long bar() {
    char *foo = malloc(sizeof(long long));
    char c;
    for(c = 0; c < sizeof(long long); c++)
        foo[c] = c;
    long long *p = (long long *) p;
    return *p;
}

Now alignment is no longer issue and this example is only dependent of the implementation defined representation of long long. What value is returned depends on the representation of long long but if that representation is defined as having no pad bits them this function must always return the same value and it must also always be a valid value. Without pad bits this function can't generate a trap representation, and so the compiler cannot perform any strict aliasing type optimizations on it.

You have to look pretty hard to find a standard conforming implementation of C that has implementation defined pad bits in any of its integer types. I doubt you'll find one that implements any sort of strict aliasing type of optimization. In other words, compilers don't use the undefined behaviour caused by accessing a trap representation to allow strict-aliasing optimizations because no compiler that implements strict-aliasing optimizations has defined any trap representations.

Note also that had buf been initialized with all zeros ('\0' characters) then this function wouldn't have any undefined or implementation defined behaviour. An all-bits-zero representation of a integer type is guaranteed not to be a trap representation and guaranteed to have the value 0.

Now for a strictly conforming example that uses char values to create a guaranteed valid (possibly non-zero) representation of a long long value:

#include <stdio.h>
#include <stdlib.h>

int
main(int argc, char **argv) {
    int i;
    long long l;
    char *buf;

    if (argc < 2) {
        return 1;
    }
    buf = malloc(sizeof l);
    if (buf == NULL) {
        return 1;
    }
    l = strtoll(argv[1], NULL, 10);
    for (i = 0; i < sizeof l; i++) {
        buf[i] = ((char *) &l)[i];
    }
    printf("%lld\n", *(long long *)buf);
    return 0;
}

This example has no undefined behaviour and is not dependent on the alignment or representation of long long. This is the sort of code that the character type exception on accessing objects was created for. In particular this means that Standard C lets you implement your own memcpy function in portable C code.

Ross Ridge
  • 38,414
  • 7
  • 81
  • 112
  • "Deduplicator is correct." - sorry, no, he wasn't -- in the original (now apparently deleted) comments, he asserted that an array like `char c[sizeof(long];` could be reinterpreted and aliased through an lvalue of type `long`. Since there is no object of type `long` there, the behavior is undefined. The quote about trap representations does not say anything about this issue; it's about a slightly different issue describing another family of undefined behaviors. – The Paramagnetic Croissant Aug 12 '14 at 16:57
  • I deleted my previous comment as a it was based on a incorrect reading of the standard. You're right that a compiler can assume a dereference of `long` pointer can't alias array `c` because it knows the effective type of `c`. But it can't generally assume a pointer to `long` and pointer to `char` can't point to the same object because it generally can't know and can't assume the effective of type of the object `char` points to. So there is two-way street when the `char` object doesn't have a declared type, and effectively one when using pointers the compiler can't trace back to a definition. – Ross Ridge Aug 12 '14 at 20:13
  • Your first paragraph sounds as if you're saying OP's second snippet would not be UB if `foo` were initialized (when in fact it would be UB); could you please clarify that. – M.M Jun 08 '17 at 23:22
  • Related to your code example, there can still be trap representations without padding bits (specific example - negative-zero on a 1's complement or sign-magnitude implementation that doesn't support negative zeroes) – M.M Jun 08 '17 at 23:23