1

I think I'm really asking: is aliasing "transitive"? If the compiler knows that A might alias B, and B might alias C, then surely it should remember that A might therefore alias C. Perhaps this "obvious" transitive logic isn't required however?

An example, for clarity. The most interesting example, to me, of a strict-aliasing issue:

// g++    -fstrict-aliasing -std=c++11 -O2
#include <iostream>

union
{   
    int i;
    short s;
} u;
int     * i = &u.i;

int main()
{   

    u.i = 1; // line 1
    *i += 1; // line 2

    short   & s =  u.s;
    s += 100; // line 3

    std::cout
        << " *i\t" <<  *i << std::endl // prints 2
        << "u.i\t" << u.i << std::endl // prints 101
        ;

    return 0;
}

g++ 5.3.0, on x86_64 (but not clang 3.5.0) gives the above output, where *i and u.i give different numbers. But they should give exactly the same number, because i is defined at int * i = &u.i; and i doesn't change.

I have a theory: When 'predicting' the value of u.i, the compiler asks which lines might affect the contents of u.i. That includes line 1 obviously. And line 2 because int* can alias an int member of a union. And line 3 also, because anything that can affect one union member (u.s) can affect another member of the same union. But when predicting *i it doesn't realise that line 3 can affect the int lvalue at *i.

Does this theory seem reasonable?

I find this example funny because I don't have any casting in it. I managed to break strict-aliasing with doing any casting.

Aaron McDaid
  • 26,501
  • 9
  • 66
  • 88
  • http://en.cppreference.com/w/cpp/language/union – Baum mit Augen Sep 28 '16 at 20:43
  • 1
    Firstly, union-based type punning is only allowed in C. Secondly, the permission to type-pun is only given when union members are accessed directly as union members. Otherwise, your transitivity would immediately obliterate all aliasing restrictions, since the compiler would generally have to assume that two unrelated pointers might point to members of one union object. – AnT stands with Russia Sep 28 '16 at 20:43
  • (I'm now trying to delete this question. I never knew that C++ unions were so different than in C. But I can't delete it. Sorry for the dumb question folks!) – Aaron McDaid Sep 28 '16 at 20:47
  • I've just asked the C-based version of this question here: http://stackoverflow.com/questions/39757658/violating-of-strict-aliasing-in-c-even-without-any-casting – Aaron McDaid Sep 28 '16 at 21:09

2 Answers2

5

Reading from inactive member of a union is undefined in C++. (It's legit in C99 and C11).

So, all in all, the compiler isn't required to assume/remember anything.

Standardese:

N4140 §9.5[class.union]/1

In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time.

krzaq
  • 16,240
  • 4
  • 46
  • 61
  • I thought I'd read a lot of stuff about strict-aliasing on SO, but I never noticed that before. I think I need to take a break from strict-aliasing question :) – Aaron McDaid Sep 28 '16 at 20:46
  • I've just managed to replicate the same issue in C. I guess I should ask another question. I can't delete this question, and I guess I shouldn't change this question to make it about C – Aaron McDaid Sep 28 '16 at 20:50
  • I lack the required knowledge to say whether the compiler is required to consider aliasing of otherwise legal cast variable in C. – krzaq Sep 28 '16 at 20:51
  • 2
    @AaronMcDaid Just ask a new question, this one is not *that* bad either. – Baum mit Augen Sep 28 '16 at 20:55
  • Thanks @BaummitAugen. I've just asked another question in C. http://stackoverflow.com/questions/39757658/violating-of-strict-aliasing-in-c-even-without-any-casting – Aaron McDaid Sep 28 '16 at 21:10
1

It is only allowed to read from the union member that was last written to in C++.

Aliasing outside unions is only allowed between 'similar' types (for details please see this Q/A), and char/unsigned char. It is only allowed to alias another type through char/unsigned char, but it is not allowed to alias char/unsigned char through other types. If the latter was allowed, then all objects would have to be treated as possibly aliasing any other object, because they could be 'transitively aliased' like you describe through char/unsigned char.

But because this is not the case, the compiler can safely assume that only objects of 'similar' types and char/unsigned char alias each other.

Community
  • 1
  • 1
alain
  • 11,939
  • 2
  • 31
  • 51
  • Applying aliasing non-transitively in cases where that makes sense would be more helpful than blocking useful forms of aliasing for fear of transitivity. For example, it would make sense to see to say that if T1 and T2 appear together in one union, and T1 and T3 appear together in another, accesses via T1* should be presumed to "dirty" things of types T2 and T3, and accesses via T2* or T3* should "dirty" things of type T1, but accesses via T2* need not dirty things of T3*, nor vice versa. – supercat Oct 04 '16 at 21:22
  • @supercat wouldn't the compiler have to know all unions in all TUs, if this was allowed? – alain Oct 04 '16 at 22:08
  • The rule for use of common-initial-sequences, which the authors of gcc don't like and blithely ignore, would only require compilers to consider unions whose complete declaration was visible at the point of usage. That rule only covers access of common initial sequences of structures within a union (rather than use of unions for other kinds of type punning) but the principle could be applied elsewhere (trading off optimization for semantic expressiveness). – supercat Oct 04 '16 at 22:14
  • If the rule were interpreted so as to allow an S to be used as a T, if a *complete declaration* of the union was visible at places where the object was used as an S, *or* where it was used as a T, that would allow many useful constructs which used to be supported but are supported only in the `-fno-strict-aliasing` dialect of gcc. For example, if many structures have a common initial sequence, and accompanying each such structure is a definition for a union which that type and a specific type that just contains that sequence, then a function which accepts a pointer to the latter struct... – supercat Oct 05 '16 at 16:37
  • ...would be able to operate upon any of the structure types that were associated with it. The authors of gcc seem to think recognizing aliasing in that case would totally trash optimizations, but having code which accepts the "common" type presume that it might alias with the others would hurt optimization far less than would requiring programmers to write it in such a way that the compiler would have to presume it capable of aliasing any object of any type, anywhere, whose address has ever been exposed to the outside world. – supercat Oct 05 '16 at 16:40
  • I have no problem with the idea that compilers should not be required to presume that aliasing might occur in cases where they would have no reason to expect it. I have a big problem with the idea that compilers given something like `uint32_t get_float_bits(float *f) { return *(uint32*)f; }` should use the rules to infer that `f` cannot possibly point to a `float`, rather than using the typecast as an indication that a `float` is likely to be read as a `uint32_t`. – supercat Oct 05 '16 at 16:44
  • What I miss the most in C++ is the ability to use a union to reinterpret a buffer as a struct. C allows it, but C++ doesn't, which I really don't understand, because it forces the programmer to use an 'unnecessary', possibly expensive memcpy. – alain Oct 05 '16 at 19:46
  • I don't know that such structures are in practice any less safe in C++ than in C; clang and gcc both push very aggressive "interpretations" of the Standard which severely limit what can be done 100% reliably (it's unclear sometimes what code-breaking behaviors are a result of bugs or design, but if a pattern isn't support reliably it's not reliable). By contrast, in C++ I think it should be possible to populate storage as one PODS, then manually invoke the destructor and use placement new to place a new object on the old storage. – supercat Oct 05 '16 at 19:56
  • Yes, I think in practice some techniques not involving a memcpy work well, the problem is the standard. I have the impression memcpy is the only standard-sanctioned way, but I'm not sure. – alain Oct 05 '16 at 20:47
  • In C++, I think explicit the combination of a trivial destructor and placement new will have defined behavior. In C, even memmove isn't guaranteed safe. In C, if memmove is used to copy something with a declared type to a region of allocated storage, the Standard would allow a compiler to presume that the destination cannot alias anything of any other type, and in at least some cases--whether by bug or design--gcc seems to exploit that freedom. – supercat Oct 05 '16 at 21:13