13

Continuing something learned in C++ error: base function is protected ...

The C++11 pointer-to-member rules effectively strip the protected keyword of any value, because protected members can be accessed in unrelated classes without any evil/unsafe casts.

To wit:

class Encapsulator
{
  protected:
    int i;
  public:
    Encapsulator(int v) : i(v) {}
};

Encapsulator f(int x) { return x + 2; }

#include <iostream>
int main(void)
{
    Encapsulator e = f(7);
    // forbidden: std::cout << e.i << std::endl; because i is protected
    // forbidden: int Encapsulator::*pi = &Encapsulator::i; because i is protected
    // forbidden: struct Gimme : Encapsulator { static int read(Encapsulator& o) { return o.i; } };

    // loophole:
    struct Gimme : Encapsulator { static int Encapsulator::* it() { return &Gimme::i; } };
    int Encapsulator::*pi = Gimme::it();
    std::cout << e.*pi << std::endl;
}

Is this really conformant behavior according to the Standard?

(I consider this a defect, and claim the type of &Gimme::i really should be int Gimme::* even though i is a member of the base class. But I don't see anything in the Standard that makes it so, and there's a very specific example showing this.)


I realize some people may be surprised that the third commented approach (second ideone test case) actually fails. That's because the correct way to think about protected is not "my derived classes have access and no one else" but "if you derive from me, you will have access to these inherited variables contained in your instances, and no one else will unless you grant it". For example, if Button inherits Control, then protected members of Control within a Button instance are accessible only to Control, and Button, and (assuming Button doesn't prohibit it) the actual dynamic type of the instance and any intervening bases.

This loophole subverts that contract, and completely opposed the spirit of the rule 11.4p1:

An additional access check beyond those described earlier in Clause 11 is applied when a non-static data member or non-static member function is a protected member of its naming class. As described earlier, access to a protected member is granted because the reference occurs in a friend or member of some class C. If the access is to form a pointer to member (5.3.1), the nested-name-specifier shall denote C or a class derived from C. All other accesses involve a (possibly implicit) object expression. In this case, the class of the object expression shall be C or a class derived from C.


Thanks to AndreyT for linking http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_closed.html#203 which provides additional examples motivating a change, and calls for the issue to be brought up by the Evolution Working Group.


Also relevant: GotW 76: Uses and Abuses of Access Rights

Community
  • 1
  • 1
Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • Yes, I don't see what protection has been stripped off in C++11. – CB Bailey Jun 06 '13 at 21:13
  • @Charles: I didn't mean to suggest that C++98 didn't have the same loophole. Just that this exists in the current version. – Ben Voigt Jun 06 '13 at 21:14
  • @BenVoigt: It is also possible to rob `private` members without casting – Andy Prowl Jun 06 '13 at 21:14
  • @Andy: Can you provide an example (that doesn't invoke undefined behavior)? – Ben Voigt Jun 06 '13 at 21:24
  • 2
    @Ben: Johannes Schaub came up with this trick. Here is the [link](http://bloglitb.blogspot.de/2011/12/access-to-private-members-safer.html) – Andy Prowl Jun 06 '13 at 21:26
  • @Andy: That's astounding... and I wonder why there's no access check on `&A::a` in this context. – Ben Voigt Jun 06 '13 at 21:34
  • @BenVoigt: No idea, but I honestly don't think it is such a bad thing (I meant the fact that it is possible to circumvent the accessibility rules *by applying a great effort*). I think what is important is that the language itself does a great deal in not letting you break those rules by mistake, or without being well aware of what you are doing. – Andy Prowl Jun 06 '13 at 21:38
  • @BenVoigt It's because of 14.7.2 p12 "The usual access checking rules do not apply to names used to specify explicit instantiations." – bames53 Jun 07 '13 at 15:17
  • @bames53: Yes I did see that rule, but I wonder why it's there. – Ben Voigt Jun 07 '13 at 16:16
  • @AndyProwl: I don't like circumventing rules *by applying a great effort*. I like the way `reinterpret_cast` works -- it permits subversion of the rules in a way that is easy to use yet very obvious. I think that if the language designers want to leave the door open to bypassing access checks, for example for serialization of class objects, a `private_access(` *`member-access-expression`* `)` syntax would be much much better. But the serialization use case doesn't really sway me, since POD objects can be trivially serialized and non-POD ones require help from the object designer. – Ben Voigt Jun 07 '13 at 16:20

2 Answers2

11

I have seen this technique, that I refer to as "protected hack", mentioned quite a few times here and elsewhere. Yes, this behavior is correct and it is indeed a legal way to circumvent protected access without resorting to any "dirty" hacks.

When m is member of class Base, then the problem with making the &Derived::m expression to produce a pointer of Derived::* type is that class member pointers are contravariant, not covariant. It would make the resultant pointers unusable with Base objects. For example, this code compiles

struct Base { int m; };
struct Derived : Base {};

int main() {
  int Base::*p = &Derived::m; // <- 1
  Base b;
  b.*p = 42;                  // <- 2
}

because &Derived::m produces an int Base::* value. If it produced a int Derived::* value, the code would fail to compile at line 1. And if we attempted to fix it with

  int Derived::*p = &Derived::m; // <- 1

it would fail to compile at line 2. The only way to make it compile would be to perform a forceful cast

  b.*static_cast<int Base::*>(p) = 42; // <- 2

which is not good.

P.S. I agree, this is not a very convincing example ("just use &Base:m from the beginning and the problem is solved"). However, http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_closed.html#203 has more info that sheds some light on why such decision was made originally. They state

Notes from 04/00 meeting:

The rationale for the current treatment is to permit the widest possible use to be made of a given address-of-member expression. Since a pointer-to-base-member can be implicitly converted to a pointer-to-derived-member, making the type of the expression a pointer-to-base-member allows the result to initialize or be assigned to either a pointer-to-base-member or a pointer-to-derived-member. Accepting this proposal would allow only the latter use.

AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
  • But it lets you operate on objects of any subclass of `Encapsulated`, even those that block further inheritance! I can't imagine that this is *correct*, even if it is *conformant*. – Ben Voigt Jun 06 '13 at 20:54
  • @Ben Voigt: No argument, it does look like a huge compromise. – AnT stands with Russia Jun 06 '13 at 21:01
  • WRT your edit, the whole point is to make the pointer-to-member unusable with `Base` objects. If you want a pointer-to-member-of-`Base`, use `&Base::m`. – Ben Voigt Jun 06 '13 at 21:05
  • I get your point. Also, see here http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_closed.html#203 . They have more convincing motivating examples. – AnT stands with Russia Jun 06 '13 at 21:12
  • Thanks for the link. As far as I can tell, all those examples support the change (they are broken under the current rule). So I disagree with your second-to-last sentence. They are just additional proof that the original decision was not well thought-out. – Ben Voigt Jun 06 '13 at 21:17
  • I would make a further argument in terms of coupling: Code that assigns `int Base::*p = &Derived::i;` is already closely coupled to `Base`, and will already break if the meaning of `Derived::i` changes (by introducing a declaration that hides the `Base` member), and so `&Base::i` doesn't introduce any additional maintenance burden. – Ben Voigt Jun 06 '13 at 21:21
5

The main thing to keep in mind about access specifiers in C++ is that they control where a name can be used. It does not actually do anything to control access to objects. "access to a member" in the context of C++ means "the ability to use a name".

Observe:

class Encapsulator {
  protected:
    int i;
};

struct Gimme : Encapsulator {
    using Encapsulator::i;
};

int main() {
  Encapsulator e;
  std::cout << e.*&Gimme::i << '\n';
}

This, e.*&Gimme::i, is allowed because it does not access a protected member at all. We are accessing the member created inside Gimme by the using declaration. That is, even though a using declaration does not imply any additional sub-objects in Gimme instances, it still creates an additional member. Members and sub-objects are not the same thing, and Gimmie::i is a distinct public member that can be used to access the same sub-objects as the protected member Encapsulator::i.


Once the distinction between 'member of a class' and 'sub-object' is understood it should be clear that this is not actually a loophole or unintended failure of the contract specified by 11.4 p1.

That one can create an accessible name for, or otherwise provide access to, an otherwise un-nameable object is the intended behavior even though it is different from some other languages and may be surprising.

bames53
  • 86,085
  • 15
  • 179
  • 244
  • No, access specifiers do not control where a name can be used. Overload resolution takes place before access checks. And secondly, if `&Gimme::i` had the type `int Gimme::*` then it would still access `Encapsulator::i`, but only within objects of type `Gimme` (or some subclass). If your interpretation were correct, then the text from 14.4p1 which I bolded in the question would not exist. – Ben Voigt Jun 07 '13 at 14:01
  • @BenVoigt Yes, overload resolution comes first; By 'access specifiers control where a name can be used' I mean that once overload resolution determines what name you are attempting to use, access specifiers determine if you are allowed to use the name. Secondly, read the text you bolded again, but with the understnding that 'member' does not mean the same thing as 'sub-object'; The phrase "access to a protected member" applies to the name `Encapsulator::i` but simultaneously does not apply to the name `Gimmie::i` because these are distinct members. The type of `&Gimmie::i` does not affect this – bames53 Jun 07 '13 at 14:20
  • But the type of `&Gimme::i` affects what objects the pointer can later be used with. I fully agree that if `Encapsulator` returns a pointer to its member subobject, any code can then dereference that pointer without a further check. But `Gimme` should not be able to get a pointer-to-member which is generically applicable to all `Encapsulator` objects... to be consistent with the bolded text, `&Gimme::i` should return a pointer which can be applied only to `Gimme` objects. – Ben Voigt Jun 07 '13 at 14:30
  • @BenVoigt The fact that the public member `Gimmie::i` can be used is not inconsistent with the bolded text; the bolded text applies only to protected members. Whether the result of using this public member should be usable with the base class or other class' derived from the same base class is a completely independent design decision unrelated to the bolded text. – bames53 Jun 07 '13 at 15:19
  • It's inconsistent with the bolded text to do an access check on `Gimme::i` and then access an object not of type `Gimme`. – Ben Voigt Jun 07 '13 at 16:22
  • Your answer's claim that `Gimme::i` is a different member (but same subobject) as `Encapsulator::i` just doesn't work. `&Gimme::i` is a pointer-to-member (not "pointer-to-subobject"), and it's a pointer-to-`Encapsulator`-member because there is no distinct member `Gimme::i`. It's not distinct, it's inherited with changed access specifier. – Ben Voigt Jun 07 '13 at 16:25
  • I'll admit that the language of the spec is muddled on its use of the term 'member'. But it's clear that there are distinct entities `Gimmie::i` and `Encapsulator::i` that simultaneously have distinct access specifications. It's not the case that the using-declaration "changes" the access of some third entity common between the two names. If that were the case then this ought to be legal: `Gimmie g; g.*&Encapsulator::i`. – bames53 Jun 07 '13 at 17:45
  • @BenVoigt "It's inconsistent with the bolded text to do an access check on Gimme::i and then access an object not of type `Gimmi`." Nowhere in the spec, let alone the quoted section, does it say that names declared in a class are only capable of accessing sub-objects of instances of that class. The spec says the access check is done on the name, and also that expressions using a name can possibly produce values usable for accessing sub-objects on instances of completely different types. – bames53 Jun 07 '13 at 17:46
  • Are you saying it would be legal to write `struct UseIt : Encapsulator { static void use( Encapsulator& e ) { e.UseIt::i = 1; } };`? But that's forbidden by the part of the quote I bolded. And for consistency, `e.*&UseIt::i = 1;` also shouldn't be allowed. – Ben Voigt Jun 07 '13 at 21:27
  • When you say "it's clear that there are distinct entities `Gimmie::i` and `Encapsulator::i` that simultaneously have distinct access specifications", wouldn't you say that `&Gimmie::i` should be a pointer-to-member that accesses the first one? And therefore can only be used with an object that contains a `Gimmie::i` member? The problem is that they *aren't* distinct entities. `&Gimmie::i` produces a pointer to `Encapsulator::i`. – Ben Voigt Jun 07 '13 at 21:28
  • @BenVoigt "Are you saying it would be legal to write [...]?" No, and not because of anything in the quoted statement. It illegal because the `.` needs right side to be the name of member of the left side, and `UseIt::i` is not the name of a member of Encapsulator. `.*` on the other hand allows the right side to be any expression that happens to be of the appropriate type. – bames53 Jun 07 '13 at 22:12
  • "The problem is that they aren't distinct entities. `&Gimmie::i` produces a pointer to `Encapsulator::i`" The expression `&Gimmie::i` happens to have the same type and value as the expression `&Encapsulator::i`, just as an expression `f()` could have the same type and value as an expression `g()`. That doesn't mean an id-expression `Gimmie::i` is the same as an id-expression `Encapsulator::i` any more than `g` and `f` must be the same. – bames53 Jun 07 '13 at 22:44
  • In C++, if two expressions have the same type and address, that means they are the same. We're not talking about value in general, we're talking about address. – Ben Voigt Jun 09 '13 at 01:18
  • @BenVoigt If two _objects_ have the same type and address they're the same object. One can't apply the same logic to conclude that `Gimmie::i` and `Encapsulator::i` are the same just because applying the so-called 'address of' operator gives the same "address" for both; They're not objects and pointers-to-members aren't really addresses like pointers to objects are. Anyway, we've already identified several ways in which `Gimmie::i` and `Encapsulator::i` behave differently and therefore they must not be the same. – bames53 Jun 09 '13 at 03:57