Strict aliasing rule and 'char *' pointers

Question

The accepted answer to What is the strict aliasing rule? mentions that you can use char * to alias another type but not the other way.

It doesn't make sense to me — if we have two pointers, one of type char * and another of type struct something * pointing to the same location, how is it possible that the first aliases the second but the second doesn't alias the first?

You can read a `T` via a `char *`, but you can't read an arbitrary `char` buffer via a `T *`. — Oliver Charlesworth, May 24 '14 at 18:09
It's a rule, nothing else... Basically allowing the compiler to optimise more (as with `restrict`)... But also compiler guys being lazy IMHO... — Macmade, May 24 '14 at 18:13
@OliCharlesworth what about writes then? Is it allowed to write to `T` via `char *`? — user3489275, May 24 '14 at 18:14
this is one of those places where C/C++ don't work the same way — Grady Player, May 24 '14 at 18:14
@user3489275: No: The lifetime of an object ends when the memory in which it is stored is reused. If the object's type has a non-trivial destructor, it is UB to do so without calling the destructor. — Kerrek SB, May 24 '14 at 18:18
@KerrekSB how does object lifetime relates to strict aliasing? — user3489275, May 24 '14 at 18:19
@user3489275: Well, you want to alias objects, but that only makes sense if the objects exist. So if there's no more object (because you reused the storage), then there's no point in aliasing. — Kerrek SB, May 24 '14 at 18:20
@KerrekSB so you are trying to say that casts is not allowed at all? — user3489275, May 24 '14 at 18:22
@user3489275: Wait, maybe my reference wasn't clear - casting the pointer is OK, but not writing to the memory of an object through a char pointer. I.e. when you write to the memory, you invalidate the original object. Reading the bytes of the underlying representation of an object through a char pointer is perfectly fine (and indeed this is how any I/O works). — Kerrek SB, May 24 '14 at 18:26
@KerrekSB: It is **not** UB to reuse the memory (and thus terminate lifetime) of an object with a non-trivial destructor. Weird as it might sound, the standard explicitly states that this is only undefined if the program depends on side effects of the destructor. Also, I don't think that writing through the `char*` *ends* the lifetime of the object. Using placement new, sure, just writing through a pointer... not so sure. — David Rodríguez - dribeas, May 24 '14 at 20:00
@DavidRodríguez-dribeas: Hm, fair enough, I should have said "destructor with side effects". — Kerrek SB, May 24 '14 at 20:51
@KerrekSB: Well, if the side effects of the destructor don't affect the observable behavior of the program, then the program does not *depend* on the destructor being executed. — David Rodríguez - dribeas, May 24 '14 at 20:54
They do alias one another. However, of course, you can't access a `char` object through an incompatible reference type. I explain here: http://stackoverflow.com/questions/29121176/can-aliasing-problems-be-avoided-with-const-variables/29217925#29217925 — jschultz410, Mar 30 '15 at 19:45
The answer to this question is "Because the standard says so" — M.M, Apr 11 '15 at 23:52
@GradyPlayer "_C/C++ don't work the same way_" Please elaborate — curiousguy, Aug 15 '15 at 03:03
@OliverCharlesworth "_you can't read an arbitrary char buffer via a T *._" Yes. Alignment alone implies you can't. — curiousguy, Aug 15 '15 at 03:04
@KerrekSB "_Reading the bytes of the underlying representation of an object through a char pointer is perfectly fine (and indeed this is how any I/O works)._" Please elaborate — curiousguy, Aug 15 '15 at 03:08
@DavidRodríguez-dribeas "_Also, I don't think that writing through the char* ends the lifetime of the object._" a polymorphic object or a POD? — curiousguy, Aug 15 '15 at 03:10
I would guess that it can be because `char` is a single byte so `char*` can represent a sequence of bytes. — Roy Avidan, Aug 18 '20 at 21:47

score 18 · Answer 1 · answered Jun 16 '15 at 18:46

18

if we have two pointers, one of type char * and another of type struct something * pointing to the same location, how is it possible that the first aliases the second but the second doesn't alias the first?

It does, but that's not the point.

The point is that if you have one or more struct somethings then you may use a char* to read their constituent bytes, but if you have one or more chars then you may not use a struct something* to read them.

answered Jun 16 '15 at 18:46

Lightness Races in Orbit

378,754
76
643
1,055

1

The reason why a char* is allowed to alias another type is simply because it provides a very simple way to serialize a struct and is a commonly used pattern. – doron Jun 30 '15 at 11:57
and now the OP knows as well. – doron Jun 30 '15 at 15:13
Define "you have one or more chars" – curiousguy Aug 14 '15 at 10:25
3

@curiousguy: What's unclear? `char buf[sizeof(something)] = {}; something* ptr = reinterpret_cast(&buf[0]); // invalid` – Lightness Races in Orbit Aug 14 '15 at 10:32
1

The whole concept of "having". A char is a byte. Every object representation is a bunch of bytes, or chars. You always have one or more chars. – curiousguy Aug 14 '15 at 10:35
@LightnessRacesinOrbit `char buf[sizeof(something)]` is an example NOT a definition. – curiousguy Aug 14 '15 at 11:07
@LightnessRacesinOrbit I don't understand why your sample is invalid. I thought `char` was an exception to strict aliasing. doesn't `vector` and `variant` and `optional` and anything managing memory uses this pattern ? (+ alignment fixes tho) – v.oddou Apr 19 '18 at 06:42
@v.oddou: They do it by actually constructing an object in that space (placement new), which is a different example. – Lightness Races in Orbit Apr 19 '18 at 13:56
how so ? you have to cast the memory content to the user type when the user calls `get<0>()` or this kind of stuff, it looks exactly like your invalid line. – v.oddou Apr 20 '18 at 02:44
@v.oddou: They _actually constructed an object in that space_ (with placement new) beforehand, so the cast is valid and correct. In the `char buf[sizeof(something)] = {}` example above, though, that has not happened - there is no `something` in existence and you can't just pretend that there is. – Lightness Races in Orbit Apr 26 '18 at 12:28
ahah, of course. lol omg I thought we were not talking at this level. It's obvious than in those exact 2 lines of code, the unconstructed object is invalid to use. but we were talking of legalese; the topic being aliasing, it could work if a 3rd line is doing `new (ptr) something{};` just after that then no? The question is not there, it's whether or not it is UB because of aliasing rules. (completely ignoring that alignof(something) might not be 1.) – v.oddou Apr 26 '18 at 14:21

score 12 · Answer 2 · edited Jul 17 '22 at 09:43

12

The wording in the referenced answer is slightly erroneous, so let’s get that ironed out first: One object never aliases another object, but two pointers can “alias” the same object (meaning, the pointers point to the same memory location — as M.M. pointed out, this is still not 100% correct wording but you get the idea). Also, the standard itself doesn’t (to the best of my knowledge) actually talk about strict aliasing at all, but merely lays out rules that govern through which kinds of expressions an object may be accessed or not. Compiler flags like -fno-strict-aliasing tell the compiler whether it can assume the programmer followed those rules (so it can perform optimizations based on that assumption) or not.

Now to your question: Any object can be accessed through a pointer to char, but a char object (especially a char array) may not be accessed through most other pointer types. Based on that, the compiler is required to make the following assumptions:

If the type of the actual object itself is not known, both char* and T* pointers may always point to the same object (alias each other) — symmetric relationship.
If types T1 and T2 are not “related” and neither is char, then T1* and T2* may never point to the same object — symmetric relationship.
A char* pointer may point to a char object or an object of any type T.
A T* pointer may not point to a char object — asymmetric relationship.

I believe, the main rationale behind the asymmetric rules about accessing object through pointers is that a char array might not satisfy the alignment requirements of, e.g., an int.

So, even without compiler optimizations based on the strict aliasing rule, writing an int to the location of a 4-byte char array at addresses 0x1, 0x2, 0x3, 0x4, for instance, will — in the best case — result in poor performance and — in the worst case — access a different memory location, because the CPU instructions might ignore the lowest two address bits when writing a 4-byte value (so here this might result in a write to 0x0, 0x1, 0x2, and 0x3).

Please also be aware that the meaning of “related” differs between C and C++, but that is not relevant to your question.

edited Jul 17 '22 at 09:43

ib.

27,830
11
80
100

answered Apr 16 '15 at 13:34

MikeMB

20,029
9
57
102

8

Alignment is NOT a rationale for the strict aliasing rule, let alone the main one. It's an orthogonal issue. The reason for the aliasing rule is to enable optimizations. (sometimes called TBAA - type-based aliasing analysis). Further, the rule is not about pointers aliasing each other either. It is about an lvalue aliasing an object. – M.M Jun 30 '15 at 11:54
@M.M: I didn't say it was the main reason for strict aliasing rule itselft, but why e.g. a `char*` may point to an `int` while a `int*` may not point to a char array. I corrected my post, so it's no longer talking about two pointers aliasing each other. Maybe you can have another look? – MikeMB Jun 30 '15 at 12:46
Also take out the paragraph starting "The main rationale" ; alignment is not a rationale for aliasing rules – M.M Jun 30 '15 at 12:57
@M.M: Sorry, I happen to disagree with you on this part and the following paragraph explains, why I think it is a valid rational. Once again, I'm not saying, it is a rational for the strict aliasing rule itself, but for that specific part of it. – MikeMB Jun 30 '15 at 13:15
The aliasing rules apply even for aligned memory. – curiousguy Aug 14 '15 at 10:42
"*If the type of the actual object itself is not known*" Can you tell me how does the compiler decides whether the object itself is known or not? For example a generic allocator implementation may use 64K char buffers, and here and there it must alias the allocated block header structs onto it to write the necessary tracking data into it. Intent is storing the allocated blocks which can be aliased by char* but if the compiler thinks the buffer holds char objects then I cannot alias the block headers onto without breaking the rule. – Calmarius Nov 28 '15 at 10:45
@Calmarius: Not sure if I understand your question. If you access the value of a char object, via an expression of a different type (even by dereferencing a pointer to a POD) you are always violating the strict aliasing rule - whether the compiler knows its actually a char or not. However, you can create an object of the appropriate type inside a char array e.g. via placement new. – MikeMB Nov 28 '15 at 15:50
For example do I invoke undefined behavior by writing this: `char *pC = malloc(123); int *pI = (int*)pC; *pI = 42;`? My rationale here is that `malloc` provides the best aligned pointers because it don't know what I use the buffer for, so misalignment cannot happen, then I only access the area via an int pointer. Allocators do something like this: they allocate buffers with `malloc`, store the buffer in a `char*` then when allocating, they use the pointer arithmetic to get an aligned address and the dereference it only via the header struct. – Calmarius Nov 28 '15 at 17:18
I mostly use C. But I'm curious what's the situation in C++. – Calmarius Nov 28 '15 at 17:36
@Calmarius: In c I believe this should be legal, in c++ this won't even compile (malloc returns void*, which can't be implicitly casted to char*). Aside from that I'm not sure if it is legal in c++ (definitively not for types that aren't trivially copyable, maybe for PODs), but I'd have to look it up in the standard again. The c++ way to do it would be placement new: `new (pI) Int(42);` – MikeMB Nov 28 '15 at 18:03
The reason for allowing objects to be accessed via character pointers is to support, even within portable programs, the idiom of copying the value of one object of type T to another value of the same type by copying `sizeof(T)` bytes from the source object's address to the destination object's address. The Committee saw no reason why portable programs would need to declare a block of storage of type `char` and access it using some other type, and sought to waive jurisdiction over any constructs that wouldn't be useful in portable programs. They never imagined that waiving jurisdiction... – supercat Jul 18 '22 at 18:14
...over non-portable constructs would be interpreted as implying any judgment that such constructs should be forbidden even within non-portable programs. – supercat Jul 18 '22 at 18:15
@supercat: Then why did they make it UB instead of Implementation Defined Behavior? – MikeMB Jul 18 '22 at 21:49
@MikeMB: The latter phrase refers either to constructs which *all* implementations are expected to process in a consistent documented fashion, or to *syntactic* constructs whose behavior is defined in useful fashion by many implementations but is *never* defined by the Standard (e.g. integer-to-pointer casts). By contrast, according to the published Rationale, the former phrase, among other things, identifies areas of "conforming language extension" by allowing implementations to process a construct usefully (typically "in a documented manner characteristic of the environment"). – supercat Jul 18 '22 at 22:02
@MikeMB: It's also important to understand that the Standard embraces the misguided philosophy that it should not recognize cases where optimization might cause defined programs to behave in ways observably inconsistent with sequential program execution, and thus embraces the even-worse corollary that any construct whose behavior might be observably be affected by a useful optimization must be classified as UB *even if the Standard would otherwise define the behavior of that construct*. Since the Standard wouldn't forbid implementations from corrupting memory if a execution gets stuck... – supercat Jul 18 '22 at 22:22
...in a side-effect free loop, both clang and gcc will sometimes generate code which corrupts memory if code gets stuck in a side-effect free loop, even if such corruption could not occur as as result of executing the loop as written, nor as a result of omitting the execution of a chunk of code which contains a loop, but which can be shown not to perform any useful computations. – supercat Jul 18 '22 at 22:27

score 4 · Answer 3 · answered Jun 30 '15 at 12:03

4

if we have two pointers, one of type char * and another of type struct something * pointing to the same location, how is it possible that the first aliases the second but the second doesn't alias the first?

Pointers don't alias each other; that's sloppy use of language. Aliasing is when an lvalue is used to access an object of a different type. (Dereferencing a pointer gives an lvalue).

In your example, what's important is the type of the object being aliased. For a concrete example let's say that the object is a double. Accessing the double by dereferencing a char * pointing at the double is fine because the strict aliasing rule permits this. However, accessing a double by dereferencing a struct something * is not permitted (unless, arguably, the struct starts with double!).

If the compiler is looking at a function which takes char * and struct something *, and it does not have available the information about the object being pointed to (this is actually unlikely as aliasing passes are done at a whole-program optimization stage); then it would have to allow for the possibility that the object might actually be a struct something *, so no optimization could be done inside this function.

answered Jun 30 '15 at 12:03

M.M

138,810
21
208
365

Dereferencing a struct has no effect, it doesn't read memory. – curiousguy Aug 18 '15 at 16:48
2

@curiousguy the lvalue produced by dereferencing a pointer to struct may be used to read or write memory. For example `struct something *x = whatever; *x = bla;` – M.M Aug 19 '15 at 00:39
I'd describe aliasing as occurring when storage is addressed or accessed via two independent means, each within the active lifetime of the other. If one were to defines aliasing in that fashion, and specify that 6.5p7 only applies in cases that would actually involve that form of aliasing, that would eliminate the need for the Effective Type nonsense as well the character-type exception, and would simultaneously allow more optimizations than are permitted under C99 while allowing the use of more code that would otherwise require `-fno-strict-aliasing`. – supercat Jul 12 '18 at 20:48
If code were to say `short *p =(short*)someObject; *p+=1;` and never use `p` again, that should not be considered aliasing because the active lifetime of `p` would only extend until its last use, and `someObject` would never be accessed via any means other than `p` within that time. If, however, code were to perform some accesses to `someObject(without using `p`) before accessing those same parts with `p`, the accesses to `someObject` would alias the lvalue `*p` that was actively associated with those parts. Recognizing what aliasing is and isn't would simplify the Standard hugely. – supercat Jul 12 '18 at 20:53

score 0 · Answer 4 · answered Jul 21 '22 at 18:06

Many aspects of the C++ Standard are derived from the C Standard, which needs to be understood in the historical context when it was written. If the C Standard were being written to describe a new language which included type-based aliasing, rather than describing an existing language which was designed around the idea that accesses to lvalues were accesses to bit patterns stored in memory, there would be no reason to give any kind of privileged status to the type used for storing characters in a string. Having explicit operations to treat regions of storage as bit patterns would allow optimizations to be simultaneously more effective and safer. Had the C Standard been written in such fashion, the C++ Standard presumably would have been likewise.

As it is, however, the Standard was written to describe a language in which a very common idiom was to copy the values of objects by copying all of the bytes thereof, and the authors of the Standard wanted to allow such constructs to be usable within portable programs.

Further, the authors of the Standard intended that implementations process many non-portable constructs "in a documented manner characteristic of the environment" in cases where doing so would be useful, but waived jurisdiction over when that should happen, since compiler writers were expected to understand their customers' and prospective customers' needs far better than the Committee ever could.

Suppose that in one compilation unit, one has the function:

void copy_thing(char *dest, char *src, int size)
{
  while(size--)
    *(char volatile *)(dest++) = *(char volatile*)(src++);
}

and in another compilation unit:

float f1,f2;
float test(void)
{
  f1 = 1.0f;
  f2 = 2.0f;
  copy_thing((char*)&f2, (char*)&f1, sizeof f1);
  return f2;
}

I think there would have been a consensus among Committee members that no quality implementation should treat the fact that copy_thing never writes to an object of type float as an invitation to assume that the return value will always be 2.0f. There are many things about the above code that should prevent or discourage an implementation from consolidating the read of f2 with the preceding write, with or without a special rule regarding character types, but different implementations would have different reasons for their forfearance.

It would be difficult to describe a set of rules which would require that all implementations process the above code correctly without blocking some existing or plausible implementations from implementing what would otherwise be useful optimizations. An implementation that treated all inter-module calls as opaque would handle such code correctly even if it was oblivious to the fact that a cast from T1 to T2 is a sign that an access to a T2 may affect a T1, or the fact that a volatile access might affect other objects in ways a compiler shouldn't expect to understand. An implementation that performed cross-module in-lining and was oblivious to the implications of typecasts or volatile would process such code correctly if it refrained from making any aliasing assumptions about accesses via character pointers.

The Committee wanted to recognize something in the above construct that compilers would be required to recognize as implying that f2 might be modified, since the alternative would be to view such a construct as Undefined Behavior despite the fact that it should be usable within portable programs. The fact that they chose the fact that the access was made via character pointer was the aspect that forced the issue was never intended to imply that compilers be oblivious to everything else, even though unfortunately some compiler writers interpret the Standard as an invitation to do just that.

Strict aliasing rule and 'char *' pointers

4 Answers4

Linked

Related