0

I'm trying to understand strict aliasing rule for C and C++. I've asked lots of questions about this and done a bit of reading on it but I just want to clarify something.

// void* can alias any other type:
int anInt;
void* pToVoid = (void*)&anInt; // Allowed
// So can char*
char* pToChar = (char*)&anInt; // Allowed

Pointer to any type can alias void*, that's why we can do something like:

int* myNewInt = (int*)malloc(sizeof(int));

But:

(Question 1) Can any pointer type alias char pointer?

    char myChars[4];
    int* pInt = (int*)myChars; // Is this allowed?
// I'm guessing so because this is how we create buffers
    float* pFloat = (float*) pInt; // I know this is strict aliasing violation

Question 2: Also when aliasing any pointer type to a char or void pointer type we need ensure correct alignment, right? There's no guarantee on the stack that a char or char array will be aligned as we get it from new or malloc, right?

My third question is if the strict aliasing rule is violated when you cast a pointer or when a pointer aliases the same memory? For example:

struct MyStruct
{
    int myInt;
    float myFloat;
};

int main()
{
    MyStruct myStructObj;
    float* pFloat = &myStructObj.myInt; // This is aliasing the wrong type, not allowed
// However if I move the float* then it no longer aliases the wrong type
    pFloat += 1;
// Now the pointer points to the right type. However is it now too late? My program
// has UB because I first aliased the pointer in the first place?
// On the other hand I assume this is allowed though:
   float pFloat = (float*)(((char*)&myStructObj.myInt) + sizeof(int));
// This way the float pointer never aliases the int, the int pointer is 
// first cast to char*, then char* to float*, which I assume is allowed.
}

In other words is the strict aliasing rule about accessing the same memory or assigning different pointer types? Because if it's only about memory access then my example of assigning the float* to the int* is fine because I move it first, right?

Edit: It's been pointed out that the aliasing rules are different for C and C++, therefore I've tagged this to be about C++.

Zebrafish
  • 11,682
  • 3
  • 43
  • 119
  • Maybe this helps : https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule. But from my experience... doing this kind of things with pointers will bite you in the end. – Pepijn Kramer Aug 20 '21 at 07:15
  • 2
    are you asking about c or c++? They are separate languages and the aliasing rules are different, please only tag one of them – Alan Birtles Aug 20 '21 at 07:16
  • The conversion itself is not a problem, dereferencing and arithmetic are. – molbdnilo Aug 20 '21 at 07:25
  • Read (or re-read) this page here https://en.cppreference.com/w/cpp/language/reinterpret_cast and then ask for clarification if you still need it - you basically need this page posting as an answer. – Richard Critten Aug 20 '21 at 08:01
  • @RichardCritten I've read, I still don't get the fine details. According to a comment above "The conversion itself is not a problem, dereferencing and arithmetic are". I don't see why int a; float b; float* pToFloat = (float*)&a; ((char*)pToFloat) + 4; is illegal. I've converted a float pointer to a char*, which is fine to alias, and I do arithmetic on the char*, not the float*. – Zebrafish Aug 20 '21 at 09:58
  • _"...arithmetic on the char* ..."_ is fine - it all depends on what you do with the pointer after the arithmetic. The __Type aliasing__ restriction is - _"Whenever an attempt is made to __read or modify the stored value of an object__ of type DynamicType through a glvalue of type AliasedType, the behavior is undefined unless one of the following is true:"_ – Richard Critten Aug 20 '21 at 10:14
  • `float pFloat = (float*)...;` is converting pointer value to a `float`. It's invalid. You are mixing pointer conversion with strict aliasing. Converting pointer __value__ `type *a = some_other_pointer` is not __accessing__ data behind the pointer `std::cout << *a;`. You seem to use the word "aliasing" when you mean "converting". – KamilCuk Aug 20 '21 at 10:25
  • `if it's only about memory access then my example of assigning the float* to the int* is fine because I move it first` What do you "move first"? – KamilCuk Aug 20 '21 at 10:51
  • Suggest you ask another question with a single simple example see [mcve] eg is " ... " UB? Then ask another, with one simple example, if you still have further points to clarify. SO is not suited for long discussion in comments. A few self-contained questions seem to work better. – Richard Critten Aug 20 '21 at 11:00

2 Answers2

0

Strict alias violation is about accessing data via an incompatible handle. In your code, you never access the data. Converting pointers is just converting the value of pointers, it has nothing really to do with alias violations.

void* can alias any other type:

Yes, you can convert any other pointer value to a pointer of type void *.

Can any pointer type alias char pointer?

A pointer has to point to a memory location aligned to the type it's pointing to. Let's say C11 6.3.2.3p7 and C++draft expr#static.cast-13 and C++draft expr#reinterpret.cast-7. When the pointer is not aligned, I see the result is undefined in C and unspecified in C++.

float* pFloat = (float*) pInt; // I know this is strict aliasing violation

No, it is not, you do not access the data behind the pointer. Assuming the pointer is properly aligned (it may be not) and sizeof(float) == sizeof(myChars) (it may be not): now, if you would do for example *pFloat = 1.0;, then you would actually access the data, and then you could possibly finally have anything to do with alias violation. I think Is using the result of new char[] or malloc to casted float * is UB (strict aliasing violation)? nicely answers all cases.

There's no guarantee on the stack that a char or char array will be aligned as we get it from new or malloc, right?

Yes.

is if the strict aliasing rule is violated when you cast a pointer or when a pointer aliases the same memory?

No, casting a pointer does not access the data. No, two pointers pointing to the same location does not access the data behind pointers.

is the strict aliasing rule about accessing the same memory or assigning different pointer types?

Only about accessing.

Because if it's only about memory access then my example of assigning the float* to the int* is fine because I move it first, right?

Because of that, there are different rules that affect it. There is no guarantee that MyStruct::myFloat is at sizeof(int) from the start - compiler can insert padding between structure members. Use offsetof macro.

KamilCuk
  • 120,984
  • 8
  • 59
  • 111
  • Pointer casting _can_ access the data in C++, I think. I'd have to double-check in which inheritance cases this can occur, but IIRC there were multiple-inheritance cases where the compiler needs to have a peek at the vtable (or something similar) to find a base offset at runtime. – MSalters Aug 20 '21 at 11:17
  • @ MSlaters : See the (Notes section) on the page I referenced in my inferior (apparently) answer https://en.cppreference.com/w/cpp/language/reinterpret_cast#Type_aliasing. You are right, not all pointers are interconvertable – Tiger4Hire Aug 20 '21 at 12:13
  • That said, there are no cases, where the v-table needs to be accessed (that just makes no sense). What you describe is related to multiple inheritance - and downcasting, unrelated to type-aliasing. See here https://stackoverflow.com/questions/48997211/c-vtable-in-multiple-inheritance-pointer-to-thunk-method – Tiger4Hire Aug 20 '21 at 12:29
-2

The problem:
In C++ aliasing is all about register usage. It is not uncommon for floating point values to held in registers which are different to those used for integers. In effect, when the compiler caches a value it must bind it to a register. If you load the same value as different types, you can end up with more than one register being bound to the same value (say, one register with the int value and one register with a float version).
One way to solve this issue is to tell the compiler to always write to memory (disabling some register caching - making the code slower)
The other is to promise the compiler it can cache any way it sees fit, and you will never change the type.

Utilities
The modern solution is (spans) A library was made to support this solution on older compilers gsl::span. Both solutions support "as_bytes" functions.

The modern way of safely manipulating such problems is to model your memory as std::bytes (gsl::bytes in older code). As this answer says Has a std::byte pointer the same aliasing implications as char*?, the standard specifies std::byte is never prone to aliasing issues. This solution does force you to copy explicitly, which is probably by design.

Update:
Your Questions:

  1. Yes, any pointer to char avoids aliasing issues (but prefer byte)
  2. If you create your objects as their own types, and generate byte-spans, then alignment is never an issue. Bytes do not have alignment.
  3. Aliasing is violated whenever you deference a pointer to the same memory, as different types. As stated above, this risks creating two cached values, which may then become out-of-sync.

Update: Just discovered https://en.cppreference.com/w/cpp/numeric/bit_cast. Cool

Additional
Sorry, I just realised I missed something from your question. Casting an arbitrary pointer to char, to a basic type (like int) is not guaranteed to work. This is nothing to do with aliasing though. It is to do with the CPU. It will work on x86/x64, because the CPU supports this. Not all CPU's do. It is always valid to copy from a char/byte pointer, though.

Tiger4Hire
  • 1,065
  • 5
  • 11
  • 1
    this does not answer the question in any way – phön Aug 20 '21 at 08:57
  • It's also so very wrong that I don't know where to start. As a random example, this answer assumes two categories of registers (integer and FP) but any two types can alias in C++, e.g. `std::ofstream` and `std::map`. Those are types that will not even fit in a CPU register. – MSalters Aug 20 '21 at 11:23
  • Well if I'm wrong so is cppreference - The purpose of strict aliasing and related rules is to enable type-based alias analysis, which would be decimated if a program can validly create a situation where two pointers to unrelated types (e.g., an int* and a float*) could simultaneously exist and both can be used to load or store the same memory (see this email on SG12 reflector). – Tiger4Hire Aug 20 '21 at 12:01
  • @Tiger4Hire: Type-based aliasing would be decimated if compilers had to allow for the possibility that lvalues that are used within the same context might alias even when the sets of types with which they are *visibly associated* are completely disjoint. Only a tiny fraction of the programs that clang and gcc make no effort to process meaningfully without `-fno-strict-aliasing` fit that description, however. Further, the only way clang and gcc's behavior could described as correct and non-buggy would be if one twists the Standard's terminology in some very weird ways... – supercat Aug 20 '21 at 15:01
  • ...that would severely degrade the usefulness of the language. For example, in both compilers, if a region of storage holds bit a T1 with bit pattern x, and is then written with a T2 that has some other pattern y, that would change the type of the storage to T2, but if code then writes a T2, the two writes of type T2, taken together, might not be regarded as having modified the storage, which may consequently be treated as being of type T1. So the only way to be safe would be to avoid storing any object whose bit pattern might match that of an object earlier written with a different type. – supercat Aug 20 '21 at 15:04
  • @supercat : Sorry my comment above doesn't make clear this is a quote from cppreference, not my opinion. Personally, I find the copy-a-span-of-bytes approach intuitive, hard to get wrong and is Stroustrup/ISO approved. I have even examined the ASM produced by GCC and if you use std::copy, it compiles it with really-optimal code. (try it on Godbolt). The truth is, only the compiler knows where your data really is. Good practice is to respect that, irrespective of the exact rules of pointer casting, etc. (IMHO) – Tiger4Hire Aug 20 '21 at 16:21
  • @Tiger4Hire: What do you make of https://godbolt.org/z/GsqqqKrMc where neither clang nor gcc will accommodate the possibility that i, j, and k might all be zero, even though behavior would seem to be defined in that case because the storage would only ever be read using the last type with which it was written? The fact that the same behavior occurs with both compilers would suggest to me that they both use an abstraction model that would allow compilers to behave as though storing temp to `((long*)p)[k]` sets its dynamic type to `long long`. – supercat Aug 20 '21 at 16:35
  • Nice example, but your example can be made simpler, and it indicates that it considers the array of "long long" and "long" as separate things (like "restrict"ed pointers) Compare with the std::span code equivalent, which visually appears to be correct https://godbolt.org/z/Ee4vWcjvG – Tiger4Hire Aug 20 '21 at 17:20
  • @Tiger4Hire: Of course the reason the compiler malfunctions is that it considers `long` and `long long` to be separate types, but each object is only ever read as the type via which it was last written. Even if one adds placement new constructs, as in https://godbolt.org/z/hszfnT7Wh storing a value to pl2[k] that the compiler recognizes as matching the bit pattern that was stored there has the effect of negating anything that has happened to that storage in the interim, including the placement new of `pl2` as type `long`. – supercat Aug 20 '21 at 17:31
  • @supercat : C++ just doesn't work that way. There is no concept "bit pattern" for a basic type. There is no runtime component of this problem. You can even see this is the ASM output above. I don't think spending anymore time on this is worthwhile. – Tiger4Hire Aug 22 '21 at 09:38
  • @Tiger4Hire: Both clang and gcc exhibit behavior which is inconsistent with the last store setting the dynamic type of the object to the type used for the store, and consistent with it reverting the dynamic type to one that was used earlier to store the same machine-level representation. Is this a result of coincidental matching bugs, or of each compiler's development team treating the fact that the other compiler processes a program nonsensically as evidence that such behavior is acceptable? – supercat Aug 22 '21 at 11:19