2

As a corollary to a question I asked previously, I am curious why the type of the type of a name ('category' of that name) in a template is set in the first phase of the 2 phase lookup, when the category itself can also depend on the template parameter. What is the real gain of this behavior?

A little clarification - I think I have a fair understanding of how the 2 phase look-up works; what I'm trying to understand is why a category of a token is definitively determined in phase 1, which differs from when dependent types are determined (in phase 2). My argument is that there is a very real gain in simplifying a difficult syntax, to make code easier to write and to read, so I am curious what the compelling reason to restrict category evaluation to phase 1 is. Is it simply for better template validation/error messages before template instantiation, or a marginal increase in speed? Or is there some fundamental attribute of templates that makes phase 2 category evaluation unfeasible?

Community
  • 1
  • 1
Rollie
  • 4,391
  • 3
  • 33
  • 55
  • The type of the type of name is not set in the first phase. The only thing that is known is that it refers to a type and not a non-type. The actual type is not known until instantiation. – Jesse Good Oct 15 '12 at 02:46
  • @JesseGood: Interpret the *type* in the question as the *category*, i.e. whether it refers to a type, template or neither. – David Rodríguez - dribeas Oct 15 '12 at 04:47
  • What is your main concern, what is the advantage of two phase lookup, or why within the context of two phase lookup the *category* of each token must be fixed during the first pass? I have tried to provide an answer a couple of times, but it ends up becoming too broad and I still felt that I might not be tackling the real issue. – David Rodríguez - dribeas Oct 15 '12 at 13:18
  • The latter - why must the category (is this the proper term for this question?) of each name be determined during phase 1. There is a clear advantage to determining this category in the 2nd phase as evidenced by the referenced question, so to me it makes sense to allow for category to be determined in the 2nd phase when multiple categories are possible. – Rollie Oct 15 '12 at 14:56

2 Answers2

2

The question could be two fold: why do we want two phase lookup in the first place, and given that we have two phase lookup, why are the interpretation of the tokens fixed during the first phase. The first is the harder question to answer, as it is a design decision in the language and as such it has its advantages and disadvantages and depending on where you stand the ones or the others will have more weight.

The second part, which is what you are interested in, is actually much simpler. Why, in a C++ language with two phase lookup are the token meaning fixed during the first phase and cannot be left to be interpreted in the second phase. The reason is that C++ has a contextual grammar, and the interpretation of the tokens is highly dependent on the context. Without fixating the meaning of the tokens during the first phase you won't even know what names need to be looked up in the first place.

Consider a slightly modified version of your original code, where the literal 5 is substituted by a constant expression, and assuming that you did not need to provide the template or typename keywords that bit you the last time:

const int b = 5;
template<typename T>
struct Derived : public Base<T> {
    void Foo() { 
       Base<T>::Bar<false>(b);   // [1]
       std::cout << b;           // [2]
    }
};

What are the possible meanings of [1] (ignoring the fact that in C++ this is determined by adding typename and template)?

  1. Bar is a static template function that takes a single bool as template argument and an integer as argument. b is a non-dependent name that refers to the constant 5. *

  2. Bar is a nested template type that takes a single bool as template argument. b is an instance of that type defined inside the function Derived<T>::Foo and not used.

  3. Bar is a static member variable of a type X for which there is a comparison operator< that takes a bool and yields as result an object of type U that can be compared with operator> with an integer.

Now the question is how do we proceed resolving the names before the template arguments are substituted in (i.e. during the first phase). If we are in case 1. or 3. then b needs to be looked up and the result can be substituted in the expression. In the first case yielding your original code: Base<T>::template Bar<false>(5), in the latter case yielding operator>( operator<( Base<T>::Bar,false ), 5 ). In the third case (2.) the code after the first phase would be exactly the same as the original code: Base<T>::Bar<false> b; (removing the extra ()).

The meaning of the second line [2] is then dependent on how we interpreted the first one [1]. In the 2. case it represents a call to operator<<( std::cout, Base<T>::Bar<false> & ), while in the other two cases it represents operator<<( std::cout, 5 ). Again the implications extend beyond what type is the second argument, as in the 2. case name b within Derived<T>::Foo is dependent, and thus it cannot be resolved during the first phase but rather postponed to the second phase (where it will also affect lookup by adding the namespaces of Base and the instantiating type T to the Argument Dependent Lookup).

As the example shows, the interpretation of the tokens impact the meaning of the names, and that in turn affects what the rest of the code means, what names are dependent or not and thus what else needs to be looked up or not during the first phase. At the same time, the compiler does perform checks during the first pass, and if the tokens could be reinterpreted during the second pass, then the checks and the results of the lookup during the first pass would be rendered useless (imagine that during the first pass b had been substituted with 5 only to find out that we are in case 2. during the second phase!), and everything would have to be checked during the second phase.

The existence of two phase lookup depends on the tokens being interpreted and it's meaning selected during the first phase. The alternative is a single pass lookup as VS does.


* I am simplifying the cases here, in the Visual Studio compiler, that does not implement two-phase lookup, b could also be a member of Base<T> for the currently instantiating type T (i.e. it can be a dependent name)

David Rodríguez - dribeas
  • 204,818
  • 23
  • 294
  • 489
  • Much of this answer seems to essentially be saying "because it's 2 phase lookup, the token category is determined in the first phase", but while the examples are illustrative of the question, they don't provide a good answer. I still can't say I know what the gain is in ensuring token categories are defined in the first phase; why not just allow ambiguous categories to be evaluated in the 2nd, the way ambiguous types are? Even if we do specify in your example that Bar is a template type, the compiler is still unable to determine the validity of `cout << b;` until the `Derived` is instantiated. – Rollie Oct 16 '12 at 05:34
  • 1
    @Rollie: That is exactly why I asked whether you were interested on the advantages of two-phase lookup over one phase lookup or given the two phase lookup whether the symbols needed to be categorized in the first. The example is quite illustrative: a single line of code and three different options. If you stored as you ask the different options for lookup, you end up with an exponentially growing set of alternative parsings that would need to be maintained together with the outcome of lookup for each one of them. Only to discard all but one in the second phase. [...] – David Rodríguez - dribeas Oct 16 '12 at 12:25
  • 1
    [...] Do you realize that this alternative does not make sense? It basically means: have a first phase where you do a lot of work and track multiple alternatives. Then do a second phase where you do the same work basically ignoring the result of the (costly) first phase. If you wanted to provide that in the language the solution is simple: avoid two phase lookup, perform just the second phase, which is the alternative Visual Studio took. But two-phase tracking all alternatives in the first phase is impractical. Now the different question is whether you want 2-phase in the first place or not. – David Rodríguez - dribeas Oct 16 '12 at 12:29
  • I think they are sort of the same question - it sounds like the reason category is set in phase 1 is because it facilitates 2 phase lookup, and there is no technical prohibition on making an exception for cases where a token's category can't be conclusively determined. If this is the case, that is more-or-less the information I was looking for. Again, I already *do* understand the way it works, so the example is just rehashing the previous question in large part; the question here is why it is implemented one way vs another. – Rollie Oct 16 '12 at 16:10
  • @Rollie: No, it is not because it facilitates the second phase, the second phase is not simpler/more complicated. On the other hand, without fixing the interpretation of the tokens you don't even know what needs to be looked up (i.e. `b` in the example, is it a new identifier or an identifier that needs to be looked up?). What you might be missing here is that in C++ the token category **is** conclusively determined in the first phase based on just the existence (or lack of) the `typename` and `template` keywords. Those are sufficient to yield a unique interpretation during phase one. – David Rodríguez - dribeas Oct 16 '12 at 16:13
  • @Rollie: Regarding whether they are or not the same question, they are not. As a metaphor consider that you want to get downtown, if the question is 'do you need a ticket to ride the bus' the answer is 'yes', and it is unrelated to other options: you can walk, or drive yourself (for neither of those you need the ticket, but if you want the bus --two phase lookup--, then you need the ticket) – David Rodríguez - dribeas Oct 16 '12 at 16:19
  • @Rollie: **Why?** is *To enable static analysis and get better error messages earlier in the development process.* Why do you think people are begging for **Concepts** when they don't enable anything templates can't already do, and they make code significantly longer? Because they improve static analysis and error messages, potentially a LOT. The C++ designers decided that more verbose code is an appropriate tradeoff to obtain these benefits. – Ben Voigt Oct 16 '12 at 16:44
0

Much of the advantages of C++ is that it's a strictly checked language. You express the intent of your program as clearly as possible, and the compiler tells you if that intent is violated.

I can't imagine that you would ever write Base<T>::Bar<false>(b); (from Dribeas's example) and not have a particular interpretation that you want. By telling the interpretation to the compiler (Base<T>::typename Bar<false>(b);), it can generate a meaningful error if someone provides a type that has a static member Bar or a nested template type instead of a member function template.

Other languages are designed to stress terseness over static analysis; for example many dynamic languages have a great number of "do what I mean" rules. Which causes fun when the compiler turns non-sensible code into something unpredictable, with no errors. (Case in point: Perl. I love it for text manipulation, but goodness DWIM is annoying. Almost everything is a runtime error, there's barely any static checking to speak of)

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • This question is derived from non-example code that was running into the issue. Yes it can happen. No you don't always explicitly state your intent to the compiler, hence the c++x11 'auto' keyword, and template function parameter type deduction rules (`swap(a, b)` instead of `swap(a, b)`). "Go use another language" is an not an answer. – Rollie Oct 16 '12 at 06:13
  • @Rollie: `auto` never gives you a function call, or a static member access, now does it? And you can cut down dramatically on use of `typename` using `auto`. I'm sorry, I don't mean for my answer to read as "you shouldn't be using C++", I'll edit that part. – Ben Voigt Oct 16 '12 at 13:19
  • @Rollie: Also, I don't doubt that you ran into this issue with real code. That doesn't change my assertion that *you had a particular interpretation in mind when you wrote it, and don't want the compiler to silently choose a different one*. – Ben Voigt Oct 16 '12 at 13:24
  • +1, the answer is sensible, whether Rollie likes it or not. Rollie, whether you always explicit your intent in code or not is irrelevant. In this particular case the compiler cannot infer the intent from your code. This is equivalent to `std::min(-1,1u)`: ambiguity and you must tell whether you want `std::min` or else `std::min`. – David Rodríguez - dribeas Oct 16 '12 at 16:16