Why doesn't GCC and Clang do this aliasing-optimization?

Question

I have a case where a friend casts a non-base class object of type "Base" to a class type object "Derived", where "Derived" is a derived class of "Base" and only adds functions, but no data. In the below code, I did add a data member x to the derived class

struct A {
  int a;
};

struct B : A {
  // int x;
  int x;
};

A a;

int g(B *b) {
   a.a = 10;
   b->a++;
   return a.a;
}

With strict alias analysis on, GCC (also Clang) always returns 10, and not 11, because b can never point to a in a well-defined code. However, if I remove B::x (as is actually the case in the code of my friend), GCC's output assembler code does not optimize the return access of a.a and reloads the value from memory. So the code of my friend that calls g "works" on GCC (as he intended) even though I think it still has undefined behavior

g((B*)&a);

So in essentially the same two cases, GCC optimizes one case and doesn't optimize the other case. Is it because b can then legally point to a? Or is it because GCC just wants to not break real-world code?

I tested the answer that states

If you remove B::x, then B meets the requirements in 9p7 for a standard-layout class and the access becomes perfectly well-defined because the two types are layout-compatible, 9.2p17.

With two layout compatible enums

enum A : int { X, Y };
enum B : int { Z };

A a;

int g(B *b) {
   a = Y;
   *b = Z;
   return a;
}

The assembler output for g returns 1, not 0, even though A and B are layout compatible (7.2p8).

So my further question is (quoting an answer): "two classes with exactly the same layout may be considered "almost the same" and they are left out of the optimization.". Can someone provide a proof of this for GCC or Clang?

Where is `g()` called? With `g(&a)`? I don't think using a global and pointer to the same global variable within the same function is anything but "undefined behaviour". And as we know, undefined behaviour can lead to all manner of things, including "what you expected". — Mats Petersson, Jun 19 '13 at 14:21
@MatsPetersson g is called like `g((B*)&a)`. In my test snippet, it is not called (I only needed the assembler output for `g`) — Johannes Schaub - litb, Jun 19 '13 at 14:23
Do you have "chapter and verse" on it being undefined behaviour? — doctorlove, Jun 19 '13 at 14:23
Can you elaborate on the return access elision, eg by posting relevant assembly outputs? — slaphappy, Jun 19 '13 at 14:28
@doctorlove: It's accessing an object via an expression of incompatible type. `a` may be accessed via expression of type `A`, but not type `B`. — MSalters, Jun 19 '13 at 14:28
Can you clarify that (whether?) your question is spcifically asking why g++ behaves the way it does in the face of this undefined behavior? In other words, is this a compiler question or a language question? — Mark B, Jun 19 '13 at 14:28
@MarkB it's both, in a way. if you find out that it is *not* undefined behavior, I'm all eager to hear it. — Johannes Schaub - litb, Jun 19 '13 at 14:32
Your question should be directed at the GCC mailing list. Why does some implementation do this in an undefined behavior case? Because the implementation's details add up to this through some internal representation. Also, if this is a GCC-ism it might disappear by using `-std=c++11` instead of `-std=gnu++11`. — rubenvb, Jun 19 '13 at 14:39
@doctorlove my understanding is that none of the bullets of 3.10p10 match — Johannes Schaub - litb, Jun 19 '13 at 14:39
I dislike the mentality of some when questions about undefined behavior arise. Undefined behavior does not mean that compiler developers shut off their brain and play dice. And it neither means that they don't care. — Johannes Schaub - litb, Jun 19 '13 at 14:41
I agree, 3.10p10 was what I found - unfortunately, my graphics driver decided to stop working. And I agree, undefined just means that it's up to the compiler vendor to do something as meaningful as possible - but some situations are terribly hard to do something with, and for the compiler to "realize" that a.a and b->a are the same here [or not realize, but play it safe], they would have to store and load the data unnecessarily, just in case your `b` object that isn't supposed to be a pointer to an `a` object is actually the same `a.a`. People prefer fast code for when it is written correctly. — Mats Petersson, Jun 19 '13 at 14:48
@MatsPetersson i am sorry, I think i had a confusing wording in my question. I reworded it, and hope it becomes clearer. Of course the *function itself* has no undefined behavior. — Johannes Schaub - litb, Jun 19 '13 at 15:01
Maybe because the two cases end up in two different paths in "analyze for possible aliasing", and the compiler thinks "well, these objects have the same structure, so could be that they are actually used interchangably", where "in this case, clearly B can't he used for an A object, so I'll we don't need to 'fix' this". Now, I didn't write a single line in gcc/g++ (I did send a small patch to someone working on the x86 variant, but I think it was reworked), so I'm just surmising the logic involved. — Mats Petersson, Jun 19 '13 at 15:06
@JohannesSchaub-litb Undefined behavior _does_ mean that some compiler writers (at least with g++) will try to make the code fail at runtime. (I don't think that this is an issue here, but I've seen it in other cases.) — James Kanze, Jun 19 '13 at 15:11
@JohannesSchaub-litb With regards to why the compiler does two different things: Ben Voigt pointed out one possible reason in his response. I'm not sure I agree with the logic, but the developers of g++ might. In which case, if `B` is layout compatible with `A`, they consider that `b` in `g` might alias global `a`, and if it isn't they don't. (And of course, since their is clearly undefined behavior in C++03, they don't need to do this conditionally.) — James Kanze, Jun 19 '13 at 15:15
@Mats: In the latter case, where `B` has *standard-layout*, it's allowed to alias things, and therefore the compiler has to actually generate accesses. — Ben Voigt, Jun 19 '13 at 15:25
@BenVoigt: So, indeed, it takes two different paths down the "alias analyzer" based on "being same layout" vs. "not the same layout". Good to know. — Mats Petersson, Jun 19 '13 at 15:28
Asking why an optimization has not been applied is a complicated question. The basic answer is that the implementation of that particular compiler does not perform that optimization in that particular case, it could be that the heuristics did not detect this, or that it was never considered or... The more interesting question is just the opposite: *is it guaranteed that it will always reload from memory?* And I don't believe that to be the case, but I think you agree with me here, right? — David Rodríguez - dribeas, Jun 30 '13 at 20:38
@JohannesSchaub-litb: Have you reproduced this while preventing the inlining of `g`? If not, try doing that. (One way to do it is to store `g`'s address in a volatile function pointer variable, then call it from there.) — user541686, Aug 01 '13 at 04:21
@Mehrdad in my testcase, I did not call `g`. I have just compiled it separately without a `main` function. — Johannes Schaub - litb, Feb 04 '17 at 19:17
@JohannesSchaub-litb With which compiler versions is this problem happening? '13 was four years ago, after all. — Iwillnotexist Idonotexist, Feb 06 '17 at 00:26
I tested it on GCC, clang, ICC, all produces the same assembly, as describes by JohannesSchaub-litb. I also try different versions of these compilers. They all 3 agree. If `B` does not declare any non static data member or any virtual function, then the compiler supposes that `*b` can be an alias for `a`. Maybe an early optimization replaces derived class that does not add any data (including vptr) by the base? — Oliv, Feb 06 '17 at 22:55
@Oliv it might be case of two struct\classes being different by last members only as detected by compiler. E..g. it is legal to use a pair such in union and access shared members of each, regardless which struct was written last. We may see strict-aliasing failing because compiler had met an edge case — Swift - Friday Pie, Feb 07 '17 at 06:34
@Swift, I made new tests, If A is polymorphic, and B derives from A, whatever you declare inside B, the compiler never do strict aliasing optimization, whatever the compiler and whatever its version. So this is definitively not related with deepness of layout compatibility. I have read your answer. I am still not 100% satisfied, I was also expecting an answer to: Why all compilers seem that pessimistic when performing strict aliasing optimization? — Oliv, Feb 07 '17 at 09:58
Great question because it's unique and interesting. To get a good answer, I think you should post it on the gcc mailing list by sending an email to gcc@gcc.gnu.org. Then if you get a satisfying answer from them, you can post it here. — Hadi Brais, Feb 09 '17 at 13:27
@Oliv, was the field in question introduced in A or in its base class? — Swift - Friday Pie, Feb 10 '17 at 05:45
@Swift, You will find the code by following the link below (I hope this link will work when you click on it!). There are to function g and gr. gr pointer argument is "restricted" which forces strict aliasing optimization, you can thus compare the assembly generated with or without strict aliasing: [https://godbolt.org/g/JQvuOP] — Oliv, Feb 10 '17 at 09:54
@Oliv I don't have enough time currently, but I just checked your code using clang++ 4. The difference in the translation occurs in the InstCombine-Transformation (you can compare ./opt test.bc -S against ./opt -instcombine test.bc -S). This step removes the bitcast and the load and directly starts returning 10. You will find this file in the clang source lib/Transforms/InstCombine/InstructionCombining.cpp. In the few seconds I looked at the file I could not find the place where aliasing matters. — overseas, Feb 10 '17 at 19:43

score 8 · Answer 1 · edited Jun 20 '20 at 09:12

8

If you remove B::x, then B meets the requirements in 9p7 for a standard-layout class and the access becomes perfectly well-defined because the two types are layout-compatible, 9.2p17 and the members both have the same type.

A standard-layout class is a class that:

has no non-static data members of type non-standard-layout class (or array of such types) or reference,

has no virtual functions (10.3) and no virtual base classes (10.1),

has the same access control (Clause 11) for all non-static data members,

has no non-standard-layout base classes,

either has no non-static data members in the most derived class and at most one base class with non-static data members, or has no base classes with non-static data members, and

has no base classes of the same type as the first non-static data member.

Two standard-layout struct types are layout-compatible if they have the same number of non-static data members and corresponding non-static data members (in declaration order) have layout-compatible types.

edited Jun 20 '20 at 09:12

Community

1
1

answered Jun 19 '13 at 14:57

Ben Voigt

277,958
43
419
720

Oh wait, 9.2p19 isn't the right clause because it talks about unions. But the rule IS there. Still looking. – Ben Voigt Jun 19 '13 at 15:00
Ok, I think 9.2p17 and 5.2.10p7 are relevant here. – Ben Voigt Jun 19 '13 at 15:06
OK they are layout compatible, but that still does not seem to allow to access one using an lvalue of another. 3.10p10 does not mention layout compatibility. Is there anywhere else something related to layout compatibility that makes it defined? – Johannes Schaub - litb Jun 19 '13 at 15:07
That's C++11. In C++03, the behavior is undefined. (But of course, one of the reasons the standard authors accepted defining it is because it actually worked in all implementations.) – James Kanze Jun 19 '13 at 15:09
@James: But I believe this question concerns Standard C++, which is to say, C++11. That's the assumption unless another version is specifically mentioned. – Ben Voigt Jun 19 '13 at 15:18
@JohannesSchaub-litb: You're accessing an object whose dynamic type is `int` via an lvalue of type `int&` via a structure type that contains an `int`. 3.10p10 says that is ok. – Ben Voigt Jun 19 '13 at 15:23
@BenVoigt I don't know. He specifically asked why g++ might be doing different things in the two cases. And g++ doesn't really implement C++11 yet (although this effect might be due to the experimental preliminary implementation). – James Kanze Jun 19 '13 at 15:23
@James: I answered why the behavior is different. When `B` has non-static data members, it ceases to be a *standard-layout class* and therefore is no longer *layout-compatible* with `A` or anything else. Which permits greater optimization. – Ben Voigt Jun 19 '13 at 15:24
@BenVoigt I made a test with enums, and GCC does alias optimize it. Question edited. – Johannes Schaub - litb Jun 19 '13 at 15:32
@JohannesSchaub-litb: Edited my answer to explicitly state the requirement that both members have the same type (which is required by the strict aliasing rule). I thought that was obvious, and I referred to it in my prior comment. – Ben Voigt Jun 19 '13 at 16:44
@BenVoigt then what difference is left compared to the enum case? We have two different types in both cases (B and A in one case are different classes and in another case are different enums). The difference does not matter because they are layout compatible (according to you, ?). Then in the struct case we must additionally check that the member has the same type. In the enum case, we do not have another type to check. So if one of these works, I would expect the enum case to work. Why is it the other way around? – Johannes Schaub - litb Jun 19 '13 at 16:48
1

@BenVoigt that bullet is not intended to apply to such cases. Otherwise, you could use an arbitrary struct to access the int if only the struct would also contain that int. That bullet is intended to prevent caching of a whole struct if you modify one of its member by using a pointer to that member (and vice versa). See the C rationale document. – Johannes Schaub - litb Jun 19 '13 at 17:54
Even so, let's assume that bullet applies, but then layout compatibility would be entirely irrelevant - after all the struct contains an int member! – Johannes Schaub - litb Jun 19 '13 at 18:02
1

@JohannesSchaub-litb: Strict aliasing requires that you use the type (here `int&`) that matches the object in the location. Layout compatibility guarantees that you're looking at the right location, i.e. `p->*&A::x` and `p->*&B::x` resolve to the same object, same alignment. In the enum case, layout compatibility again guarantees the right location and right alignment, but you have the wrong type. – Ben Voigt Jun 20 '13 at 06:21
2

@BenVoigt please bear with me, but I still do not understand why in the class case `B` and `A` do not have an alias conflict. Apparently you are saying that alias analysis does not check class types for matching, but only the members that are accessed. But if that is the case, why *do* the bullets list classes? It says "a type that is a (possibly cv-qualified) base class type of the dynamic type of the object" - this clearly concerns class types in an access. Or are you saying that a `a.b` does not constitute accessing the value of an object by an lvalue of type `decltype(a)`? – Johannes Schaub - litb Jun 20 '13 at 13:22
In other words, are you saying that this would be an aliasing violation: `*b = B()` (assuming my function's body of the question and that `b` points actually to an `A`). Then `b->x++` is valid (because the final lvalue is of type `int`, and the object expression has no bearing on aliasing analysis), but `*b = B()` is not (because the final lvalue is of type `B`)? I am sorry, but I do not understand what you are saying at all, yet. – Johannes Schaub - litb Jun 20 '13 at 13:24
1

@JohannesSchaub-litb the bullet list is one of the greatest mystery of C++. – curiousguy Jul 06 '13 at 00:05
2

The strict aliasing rule does not have an exception for layout-compatible classes, so it's not clear to me how you think this answers the question. Would suggest editing some more explanation into the answer. – M.M Feb 11 '17 at 02:25
@M.M in addition, Swifts answer shows that even layout compatible structs are "alias optimized" if they are not in an inheritance relationship. So this answer appears to be incorrect (?) – Johannes Schaub - litb Feb 11 '17 at 16:24

Swift - Friday Pie · Answer 2 · 2017-02-07T07:45:49.853

Undefined behavior includes case that it does work, even if it shouldn't.

According to standard usage of this union allows to access type and size fields of either header or data members:

union Packet {
   struct Header {
   short type;
   short size;  
   } header;
   struct Data {
   short type;
   short size;  
   unsigned char data[MAX_DATA_SIZE];
   } data;
}

This is strictly limited to unions, but many compilers support that as kind of extension, provided the "incomplete" type would be ended with array of undefined size. If you delete extra static non-member from child class it indeed becomes trivial and layout compatible, which allows aliasing?

struct A {
  int a;
};

struct B  {
  int a;
  //int x;
};

A a;

int g(B *b) {
   a.a = 10;
   b->a++;
   return a.a;
}

Still performs aliasing optimization, though. In your case with same amount of non-static members the most derived class is assumed to be same as the base class. Let's invert the order:

#include <vector>
#include <iostream>

struct A {
  int a;
};

struct B : A  {
  int x;
};

B a;

int g(A *b) {
   a.a = 10;
   b->a++;
   return a.a;
}

int main()
{
    std::cout << g((A*)&a);
}

This returns 11 as expected, as a B is clearly also an A, unlike the original attempt. Let's play further

struct A {
  int a;
};

struct B : A {
    int  foo() { return a;}
};

Would not cause aliasing optimization, unless foo() is virtual. Adding non-static or const member to B would result in "10" answer, adding non-trivial constructor or a static doesn't.

PS. In second example

enum A : int { X, Y };
enum B : int { Z };

Layout compatibleness between those two here is defined by C++14, and they aren't compatible with underlying type (but convertible). although something like

 enum A a = Y;
 enum B b = (B*)a;

may produce undefined behavior, just like if you try to map float with arbitrary 32bit value.

This confuses me: "Still performs aliasing optimization, though. ". So when A and B are unrelated classes that are layout compatible (actually, structucally *equivalent*!), then it does the optimization. But if everything is equal, but there's just a base-derived relationship like in my testcase, then it does not do the optimization. This indicates, that it's not a mere thing of "layout compatibility" or structural equivalence, but something to do with inheritance, right? Or am I misunderstanding the answer? — Johannes Schaub - litb, Feb 11 '17 at 16:19
Rewarding to this, since it was the most helpful answer so far. — Johannes Schaub - litb, Feb 11 '17 at 17:55
@Johannes Schaub - litb Ironically, my students, when challenged with this code, pointed at the (B*)&A and said "That's wrong", basing their understanding exactly on relation of the base and derived class (and they know about underlying address mechanism) If standard doesn't regulate it (or we didn't found that), it still kind of makes sense from the point of strict aliasing rule if we interpret "different type" in polymorphism manner? "A is not B" if they are unrelated, "B is also A" but not "equal" and "pointer to A might be pointer to B" when A is base class of B, — Swift - Friday Pie, Feb 12 '17 at 09:32

score 0 · Answer 3 · edited May 23 '17 at 12:34

0

I think your code is UB, as you are dereferencing a pointer that comes from a casting that violates the type aliasing rules.

Now, if you activate the strict aliasing flag, you are allowing the compiler to optimize the code for UB. How to use this UB is up to the compiler. You can see the answers for this question.

Regarding gcc, the documentation for -fstrict-aliasing shows it can optimize based on:

(...) an object of one type is assumed never to reside at the same address as an object of a different type, unless the types are almost the same.

I have not been able to find a definition for "almost the same", but two classes with exactly the same layout may be considered "almost the same" and they are left out of the optimization.

edited May 23 '17 at 12:34

Community

1
1

answered Feb 11 '17 at 01:18

J. Calleja

4,855
2
33
54

The function "g" in isolation does not have undefined behavior, so this cannot explain the difference in optimizer outputs of the two compilers. – Johannes Schaub - litb Feb 11 '17 at 17:11
When you activate the -fstrict-alisiang flag, you are telling the compiler that it can optimize assuming that you are not doing anything that leads to UB due to aliasing. You can only get to b == &a through UB. So, the optimizer notices that b != &a and g() will always return 10. So I though that the real question was why the optimizer is not always using this optimization, depending on the B declaration. I could not find anything on the standard but I found out that the documentation talks about the non-standard concept of "almost the same types". – J. Calleja Feb 11 '17 at 23:53

score 0 · Answer 4 · answered May 03 '17 at 16:21

I believe the following is legal C++ (without invoking UB):

#include <new>

struct A {
  int a;
};

struct B : A {
  // int x;
};

static A a;

int g(B *b);
int g(B *b) {
   a.a = 10;
   b->a++;
   return a.a;
}

int f();
int f() {
  auto p = new (&a) B{};
  return g(p);
}

because (the global) a always refers to an object of type A (even though it's subobject of a B-object after a call to f()) and p points to an object of type B.

If you mark a to have static storage duration (as I did above), all compilers I've tested will happily apply strict aliasing and optimize to return 10.

On the other hand, if you mark g() with __attribute__((noinline)) or add a function h() that returns a pointer to a

A* h();
A* h() { return &a; }

the compilers I've tested assume that &a and the parameter b can alias and reload the value.

Why doesn't GCC and Clang do this aliasing-optimization?

4 Answers4