5

UPDATE: I do appreciate "don't want that, want this instead" suggestions. They are useful, especially when provided in context of the motivating scenario. Still...regardless of goodness/badness, I've become curious to find a hard-and-fast "yes that can be done legally in C++11" vs "no it is not possible to do something like that".


I want to "alias" an object pointer as another type, for the sole purpose of adding some helper methods. The alias cannot add data members to the underlying class (in fact, the more I can prevent that from happening the better!) All aliases are equally applicable to any object of this type...it's just helpful if the type system can hint which alias is likely the most appropriate.

There should be no information about any specific alias that is ever encoded in the underlying object. Hence, I feel like you should be able to "cheat" the type system and just let it be an annotation...checked at compile time, but ultimately irrelevant to the runtime casting. Something along these lines:

Node<AccessorFoo>* fooPtr = Node<AccessorFoo>::createViaFactory();
Node<AccessorBar>* barPtr = reinterpret_cast< Node<AccessorBar>* >(fooPtr);

Under the hood, the factory method is actually making a NodeBase class, and then using a similar reinterpret_cast to return it as a Node<AccessorFoo>*.

The easy way to avoid this is to make these lightweight classes that wrap nodes and are passed around by value. Thus you don't need casting, just Accessor classes that take the node handle to wrap in their constructor:

AccessorFoo foo (NodeBase::createViaFactory());
AccessorBar bar (foo.getNode());

But if I don't have to pay for all that, I don't want to. That would involve--for instance--making a special accessor type for each sort of wrapped pointer (AccessorFooShared, AccessorFooUnique, AccessorFooWeak, etc.) Having these typed pointers being aliased for one single pointer-based object identity is preferable, and provides a nice orthogonality.

So back to that original question:

Node<AccessorFoo>* fooPtr = Node<AccessorFoo>::createViaFactory();
Node<AccessorBar>* barPtr = reinterpret_cast< Node<AccessorBar>* >(fooPtr);

Seems like there would be some way to do this that might be ugly but not "break the rules". According to ISO14882:2011(e) 5.2.10-7:

An object pointer can be explicitly converted to an object pointer of a different type.70 When a prvalue v of type "pointer to T1" is converted to the type "pointer to cv T2", the result is static_cast(static_cast(v)) if both T1 and T2 are standard-layout types (3.9) and the alignment requirements of T2 are no stricter than those of T1, or if either type is void. Converting a prvalue of type "pointer to T1" to the type "pointer to T2" (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value. The result of any other such pointer conversion is unspecified.

Drilling into the definition of a "standard-layout class", we find:

  • has no non-static data members of type non-standard-layout-class (or array of such types) or reference, and
  • has no virtual functions (10.3) and no virtual base classes (10.1), and
  • has the same access control (clause 11) for all non-static data members, and
  • has no non-standard-layout base classes, and
  • either has no non-static data member in the most-derived class and at most one base class with non-static data members, or has no base classes with non-static data members, and
  • has no base classes of the same type as the first non-static data member.

Sounds like working with something like this would tie my hands a bit with no virtual methods in the accessors or the node. Yet C++11 apparently has std::is_standard_layout to keep things checked.

Can this be done safely? Appears to work in gcc-4.7, but I'd like to be sure I'm not invoking undefined behavior.

Community
  • 1
  • 1
  • The initial syntax should be `Node& foo = *new Node;`. I still don't understand the question though :) – iammilind Jun 27 '12 at 03:58
  • @iammilind Oops, typo, fixed...thanks! In principle it's probably quite similar to [this question](http://stackoverflow.com/questions/2475920/how-to-make-2-incompatible-types-but-with-the-same-members-interchangeable), I just have more control over the types involved and can rewrite everything from scratch. So I'm wondering what options that might open up for me in this specific scenario. – HostileFork says dont trust SE Jun 27 '12 at 04:02
  • "*I want to "alias" an object pointer as another type, for the sole purpose of adding some helper methods.*" Why? What do member functions give you that free functions cannot? Unless I'm missing something obvious (very possible), this sounds like a bit of misguided OOP zealotry... – ildjarn Jul 13 '12 at 01:42
  • @ildjarn Maybe? I've lamented that SO is often a damned-if-you-do-explain (*"What's with all the specifics, what's your actual question?"*) vs. damned-if-you-don't-explain (*"Why would you need that?"*) environment. :( I have a final class that has NC const operations and N non-const operations, lives in a memory mapped file. Sometimes I'd like to augment and "type" the interface to this class as a hint to help people out. The DOM is the [best analogy I've got, sadly](http://stackoverflow.com/questions/11167483/deriving-from-a-base-class-whose-instances-reside-in-a-fixed-format-database-m) – HostileFork says dont trust SE Jul 13 '12 at 02:08

4 Answers4

5

I believe the strict aliasing rules forbid what you are trying to do.

To clarify: strict aliasing has nothing to do with layout compatibility, POD types or what not. It has to do with optimization. And with what the language explicitly forbids you to do.

This paper sums it up rather well: http://dbp-consulting.com/StrictAliasing.pdf

Paul Groke
  • 6,259
  • 2
  • 31
  • 32
  • +1 as I found some useful tags ("type-punning", for instance) and jumping off points from strict aliasing. But most of the strict aliasing links I found seem to talk about types with different size or layout, and here it's only methods that are different w/no virtual dispatch as all are constructed as the same type. It seems that it's even okay to cast from signed to unsigned variations of the same type in strict aliasing...so I would think that this question of method dispatch would depend on something like this "standard layout" definition (if anything), or maybe some other umbrella rule? – HostileFork says dont trust SE Jul 03 '12 at 10:12
  • Interesting, thanks for the update w/the PDF, I read over that and see what you're talking about. Though now I wonder...if all the class methods were `const`, so the data members were never modified during the lifetime...would *that* be legal? (And if not, why not?) – HostileFork says dont trust SE Jul 06 '12 at 03:57
  • 1
    @HostileFork: It would not because the compiler need not guarantee that while you are executing a `const` method on the object it does not change under your feet through its other type. – Matthieu M. Jul 07 '12 at 12:03
  • @MatthieuM. In light of the answer by jthill, is there such a thing as a "safety cast" to the fundamental type before each access which can sidestep the aliasing problems, if adhered to strictly? – HostileFork says dont trust SE Jul 08 '12 at 22:47
  • 4
    @HostileFork: Honestly ? I don't know. And I don't even see the point. The C++ Standard is extremely intricate and hard to navigate and I don't have the time, at the moment, to try and check all the seemingly broken answers your question generated. So I will only say this: the closer you get to the edges of the specifications, the more likely you are to hit a compiler bug. So, as engineer, I advise to be pragmactic and avoid playing mind games with your compiler; whether you are right or wrong is irrelevant: when you get a crash in production code, *you* lose, not the compiler writers. – Matthieu M. Jul 09 '12 at 06:37
  • I'm going to award this answer the bounty, as it brought to light this "strict aliasing" I did not already know about...which does seem fundamentally important to worry about as a reason for why this could be technically prohibited. There still seems to be a gray area as to whether a "safety cast" which invokes mention of the base type is enough to workaround this (see [jthill's answer](http://stackoverflow.com/a/11378177/211160)) although this does of course start to get too brittle to apply in most reasonable circumstances. – HostileFork says dont trust SE Jul 10 '12 at 00:19
3

If I understand you correctly, you have:

  • A NodeBase class that is stateful, and the true workhorse of the system;
  • a set of stateless Accessor types that provide an interface to NodeBase; and
  • a Node<AccessorT> class which wraps an accessor, presumably providing convenience functions.

I assume the last bit because if you don't have a wrapper type that does convenience stuff, then there's no reason not to make the Accessor types your top-level, like you suggested: pass AccessorFoo and AccessorBar around by value. The fact that they aren't the same object is entirely moot; if you think of them like the pointers that they are, then you'll note that &foo != &bar is no more interesting than having NodeBase* p1 = new NodeBase; NodeBase* p2 = p1; and noting that, of course, &p1 != &p2.

If you really do need a wrapper Node<AccessorT> and want to make it standard-layout, then I would suggest that you use the statelessness of your Accessor types to your advantage. If they are simply a stateless container of functionality (which they must be; why else would you be able to freely cast them?), then you can do something like this:

struct AccessorFoo {
    int getValue(NodeBase* n) { return n->getValueFoo(); }
};

struct AccessorBar {
    int getValue(NodeBase* n) { return n->getValueBar(); }
};

template <typename AccessorT>
class Node {
    NodeBase* m_node;

public:
    int getValue() {
        AccessorT accessor;
        return accessor.getValue(m_node);
    }
};

In this case, you could add a templated conversion operator:

template <typename OtherT>
operator Node<OtherT>() {
    return Node<OtherT>(m_node);
}

And now you've got direct value conversion between any Node<AccessorT> type you like.

If you take it just a bit further, you'll make all the methods of the Accessor types static, and arrive at the traits pattern.


The section of the C++ standard that you quoted, incidentally, concerns the behavior of reinterpret_cast<T*>(p) in the case that both the source type and the final type are pointers to standard-layout objects, in which case the standard guarantees that you get the same pointer you'd get from casting to a void* and then to the final type. You still don't get to use the object as any type other than the type it was created as without invoking undefined behavior.

John Calsbeek
  • 35,947
  • 7
  • 94
  • 101
  • +1 Yup you got the gist, and accessor methods *should* be static...though that's going to be tough on the Accessor authors compared to being able to call NodeBase methods when they were inheriting from it. :-/ I'm explicitly wishing for these `Node` types to not be separate from the "workhorse" NodeBase instance because I use `unique_ptr>` to follow the hot-potato of ownership for Nodes. To keep using that I'd have to add yet another level of heap-allocation-tracking on these `Node` handles instead of pass by value. So the standard layout trick is what I'm most curious about. – HostileFork says dont trust SE Jun 27 '12 at 05:41
  • 1
    @HostileFork Why not invert that, and put the `unique_ptr` inside of `Node`? If you need other kinds of smart pointers, it's easy enough to make different variants of `Node`, especially if it's a small wrapper around the accessor. Also, looking carefully, I don't think the standard-layout trick is what you think it is. The quoted section is about `reinterpret_cast` sharing semantics with `static_cast` in the case of standard-layout types. You still don't get to change types without invoking undefined behavior. – John Calsbeek Jun 27 '12 at 05:56
  • I didn't pick up on that bit...hm...in actuality the nodes are created with a factory method as NodeBase and then cast down in the return...would that be legitimate here or does it change nothing? As for making lightweight classes like `UniqueNode` which are different from `Node`...sounds possible, lots of things are *possible*! I'm just trying to hammer this into a clean separation of concerns that doesn't waste its time with unnecessary combinatorial explosions of classes and is as efficient as possible. (I'd probably use another language if I didn't find that motivating.) :) – HostileFork says dont trust SE Jun 27 '12 at 06:14
  • It would appear that even with standard layout types, this is prohibited by strict aliasing requirements (see @PaulGroke's [response](http://stackoverflow.com/a/11302435/211160)). In which case, the only thing the standard is allowing you to do is to temporarily hold the pointer as another type and then convert it back, but not use it in the meantime? (Perhaps more specifically, not call any non-const methods?) – HostileFork says dont trust SE Jul 07 '12 at 11:24
  • After running around on this question for a while, here's what I believe I'm going to go with: [sample library for pattern and test program on Gist](https://gist.github.com/3106817). "Foo" in this case is the Node, and "Wrapper" is what I'll probably wind up calling NodeRef or somesuch. It still seems a bit of a runaround in order to achieve the effect vs. the "type-punning"--but if it eliminates undefined behavior and the client code looks ok, I guess it's fine... ! – HostileFork says dont trust SE Jul 13 '12 at 20:56
3

The term Accessor is a dead giveaway: what you are looking for is a Proxy.

There is no reason for a Proxy not to be passed around by value.

// Let us imagine that NodeBase is now called Node, since there is no inheritance

class AccessorFoo {
public:
    AccessorFoo(Node& n): node(n) {}

    int bar() const { return node->bar; }

private:
    std::reference_wrapper<Node> node;
};

And then you can freely convert from one accessor to another... though this smells. Normally the very goal of having an accessor is to restrict access in a way, so casting nilly willy to another accessor is bad. However one could support casting to a narrower accessor.

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
  • +1 Thanks..although from pages on that site it would be more like a [Facade](http://www.vincehuston.org/dp/facade.html), though in practice I might want a [Decorator](http://www.vincehuston.org/dp/decorator.html). *"Adapter provides a different interface to its subject. Proxy provides the same interface. Decorator provides an enhanced interface (...) Facade defines a new interface, whereas Adapter reuses an old interface."* I'm willing to have the object-relational-model be a compromise to match up to my desires (any choices will be) but just don't want *actively illegal* C++. Hence question. – HostileFork says dont trust SE Jun 27 '12 at 07:05
  • @HostileFork: The names are always problematic. A `Facade` is generally more an aggregation of interfaces (for example `libclang` is a C-Facade over the Clang libraries). The idea that a `Proxy` provides the same interface is quite restrictive. From Wikipedia, a `Proxy` *provides a placeholder for another object to control access, reduce cost, and reduce complexity.* Your question looks a lot like *control access*, in which case it is natural to present a restricted interface. – Matthieu M. Jun 27 '12 at 08:20
  • You say *"There is no reason for a Proxy not to be passed around by value"* which runs counter to my notion that there could be a reason. Namely: if you want an effective "is-a" relationship then if a pointer to the Proxy "were-a" pointer to the node as well, then you can use unique_ptr (shared_ptr, weak_ptr, etc.) with the Proxy in the same way you might use it with a direct Node pointer. This works if you're using C++ inheritance to implement the pattern, but if your object is a realization of an external database item it doesn't. I'm asking if it's possible to finesse this gap in legal C++. – HostileFork says dont trust SE Jun 27 '12 at 16:07
  • 1
    @HostileFork: But the `Proxy` *is not* a Node, since it is specifically meant to restrict the available interface. Isn't it ? – Matthieu M. Jun 27 '12 at 17:31
  • Sorry that I've not really pinned down whether it's intended to "restrict" or "augment" the Node interface. The bigger point is that given a client's understanding that it can *always* get back a generic node, any "restriction" would only be a "hey are you sure you want to do this" step, encouraging you to check the accessor interface to see if a higher-order operation would be more suited. If "restriction" turned out to actually be the point, I'd be doing the actual casting inside a wrapper method, to mimic the "only derived classes know about the is-a" relationship of protected inheritance. – HostileFork says dont trust SE Jun 27 '12 at 17:42
  • @HostileFork: In that case, you still don't need inheritance. A simple `explicit operator Node& () { return *node; }` (and its `const` counterpart) would be sufficient. – Matthieu M. Jun 27 '12 at 18:09
  • Yes...if one "contains" a node pointer, then it is certainly possible to extract the contained node via a cast operator, while not inheriting from node. (?) Maybe I'm misunderstanding your point, but I don't see it applying to this perhaps Quixotic and/or self-contradicting goal to try and get those "is-a" ("is-really-only-a"?) benefits with regard to a new interface for an object passed by pointer. Wrapping it up into a new value class--even if it has cast operations--won't hit that particular property. – HostileFork says dont trust SE Jun 27 '12 at 19:15
  • It would appear that even with standard layout types, this is prohibited by strict aliasing requirements (see @PaulGroke's [response](http://stackoverflow.com/a/11302435/211160)). In which case, the only thing the standard is allowing you to do is to temporarily hold the pointer as another type and then convert it back, but not use it in the meantime? (Perhaps more specifically, not call any non-const methods?) – HostileFork says dont trust SE Jul 07 '12 at 11:25
  • I appreciate the time you've taken in your responses...and your point raised about how treading into dark areas of the specification is probably more trouble than it is worth in practice. Still, in the interests of formal understanding, I think it can be worth it to ferret out the answers to these kinds of questions, to the extent they exist. The point of the [tag:language-lawyer] tag is to pursue these understandings, whether the motivations are "reasonable" or not...! – HostileFork says dont trust SE Jul 10 '12 at 00:27
  • I've taken your advice (I hope/think?) and tinkered through the implementation of a Proxy system that *hopefully* does not introduce any undefined behavior issues, yet gives the client behavior I want. You can make the Accessor inherit protected or public. It has a template which inherits from its own partial template specialization in order to implement the mutable variant as a derived version of the const variant, which finesses a couple of things. It's relatively short and weird, but [here's the gist (pun intended)](https://gist.github.com/3106817) – HostileFork says dont trust SE Jul 14 '12 at 00:37
  • You're probably tired of this question (and I don't blame you), but I'm resolving answers for all my un-accepted questions. And if you have anything to say about the related codereview: [here it is](http://codereview.stackexchange.com/questions/33713/proxy-facade-implementation-concept-in-c11-impedance-matching-db-with-classes) *(currently bountied)*. – HostileFork says dont trust SE May 03 '14 at 02:45
  • Is there any reason to use a `reference_wrapper` (a pointer that is itself wrapped with sugar-coating accessors) here when an _actual_ reference member would work just fine? Seems like unnecessary complication to me, unless I'm missing something. – underscore_d Apr 08 '16 at 07:44
  • @underscore_d: making assignment work. You cannot assign a new reference to an existing reference (aka, reseat the reference), it just assigns the *value* referred to. – Matthieu M. Apr 08 '16 at 16:18
  • Ah, course, that's what I was missing. I forgot about that benefit of `reference_wrapper`, since I've never needed to use it and tend to forget the finer details. Thanks! – underscore_d Apr 08 '16 at 17:32
2
struct myPOD {
   int data1;
   // ...
};

struct myPOD_extended1 : myPOD {
   int helper() { (*(myPOD*)this)->data1 = 6; };  // type myPOD referenced here
};
struct myPOD_extended2 : myPOD { 
   int helper() { data1 = 7; };                   // no syntactic ref to myPOD
};
struct myPOD_extended3 : myPOD {
   int helper() { (*(myPOD*)this)->data1 = 8; };  // type myPOD referenced here
};
void doit(myPOD *it)
{
    ((myPOD_extended1*)it)->helper();
    // ((myPOD_extended2*)it)->helper(); // uncomment -> program/compile undefined
    ((myPOD_extended3*)it)->helper();
}

int main(int,char**)
{
    myPOD a; a.data1=5; 
    doit(&a);

    std::cout<< a.data1 << '\n';
    return 0;
}

I believe this is guaranteed to work in all conforming C++ compilers and must print 8. Uncomment the marked line and all bets are off.

An optimizer might prune searches for valid aliased references by checking the list of syntactic types actually referenced in a function against the list (in 3.10p10) of syntactic types of references it's required to produce correct results for -- and when an actual ("dynamic") object type is known, that list doesn't include access through a reference to any derived type. So the explicit (and valid) downcasts of this to myPOD* in the helper()s puts myPOD on the list of types syntactically referenced there, and the optimizer has to treat the resulting references as potential legal references to (other names for, aliases of) the object a.

jthill
  • 55,082
  • 5
  • 77
  • 137
  • The aliasing arguments others have made seem to suggest that the compiler is free to assume that aliases as other types than the one by which a class was allocated (`myPOD` in this case) do not exist. Hence `data1` could be changed in one of the aliases and then that change not reflected in another. @MatthieuM has suggested even constness doesn't protect you as one of the aliases may be non-const. Can you cite a reason why you think this would be an exception to the rule? – HostileFork says dont trust SE Jul 08 '12 at 06:06
  • You can convert a `structA*` to a(n even completely unrelated) `structB*` and back again, and the compiler has to allow for that possibility: converting a pointer and passing it along doesn't mean that converted value can never be used to modify the original object. – jthill Jul 08 '12 at 11:45
  • It seems what you're saying runs against the [strict aliasing paper](http://dbp-consulting.com/StrictAliasing.pdf), which suggests that you can do this only when you're casting between the actual type with which something was allocated (such as if you had declared something as myPOD_extended1, and then you cast it to myPOD and back). This is the central argument of the question I'm asking and it really does sound like strict aliasing permits the compiler to make optimization assumptions which would undermine this in other cases. :-/ – HostileFork says dont trust SE Jul 08 '12 at 17:56
  • 1
    Once the compiler sees `doit(&a)` it's not permitted to assume `a` will not be accessed through that pointer or any copy. The upcasts in `doit` are explicitly permitted, see 5.4p4, and the `helper()` functions receive a copy of that upcast pointer (as `this`), so the compiler can't assume they don't reference the `myPOD` through that copy either. I've added safety casts to the helper functions; I think a compiler would have to go out of its way to mishandle this case but I see it wasn't strictly correct before (which in this discussion means 'was wrong'). I believe it's strictly correct now. – jthill Jul 08 '12 at 21:26
  • Ok...so if you're saying that doing the cast back to the base type before any modifications prevents "aliasing", then what I want to do would be legal...? Because my derived types never add any members. I guess the trick is to make sure that all data member accesses go through what you are calling a "safety cast" (reads AND writes) instead of trying to access directly w/o a cast, which *would* be aliasing and therefore illegal? – HostileFork says dont trust SE Jul 08 '12 at 22:40
  • tl;dr: yes. Long: I can (now) see that an optimizer would prune searches for valid aliased references by checking the list of syntactic types actually referenced in a function against the list (in 3.10p10) of syntactic types that it's required to produce correct results for -- and when an actual ("dynamic") object type is known, that list doesn't include access through a reference to any derived type. Fair enough. So explicitly downcasting `this` puts `myPOD` on the list of types referenced in the `helper()`s, so if the optimizer can't prove `this != &a` it must treat them as (valid) aliases – jthill Jul 08 '12 at 23:50
  • This answer *looks wrong* to me. `a` is not a `myPOD_extended1` (or `2` or `3`) so the idea of casting to pointer to this type looks wrong; and I don't see how the paragraphs you cite would somehow justify this. – Matthieu M. Jul 09 '12 at 06:22
  • If you read 5.4p4 you'll see that the first applicable conversion is the `reinterpret_cast`. 5.2.10p7 is the applicable paragraph there, and includes "A pointer to an object can be explicitly converted to a pointer to a different object type.[...] Converting a prvalue of type “pointer to T1” to the type “pointer to T2” [...] and back to its original type yields the original pointer value." The upcast pointer is never dereferenced. All casts are going to "look wrong": they're there to work around syntactical limitations in the language. Things like this are exactly what 5.2p10 is for. – jthill Jul 09 '12 at 14:06
  • Interesting, I'm going to have to look at this in detail and dig up the references. I wonder if you adopt my initial reading that the category of classes this would be legal to do would be "standard layout" types, and not just POD? *(Also: terminology note...the closer you go to the base class it's "upcasting". Seems kind of backwards as you'd think the term "base" would mean "bottom-most", but no... :-/)* – HostileFork says dont trust SE Jul 09 '12 at 21:13