What are all the syntax problems introduced by the usage of angle brackets in C++ templates?

Question

In C++ templates are instantiated with angle brackets vector<int> and the Java and C# languages have adopted the same syntax for their generics.

The creators of D, however, have been quite vocal about the problems that angle brackets bring and they made a new syntax foo!(int) — but I've never seen too many details about what problems angle brackets bring, exactly.

One of them was when instantiating a template with another template vector<vector<int>>, which would cause some (older?) compilers to confuse the trailing '>>` with the bit-shift or streaming operators. The solution was to insert a space between the two angle brackets, but haven't compilers become able to parse that syntax, nowadays?

Another problem was when using the greater-than operator foo<3 > 2>. The parser would think that the operator actually closes the template instantiation; the fix was to introduce parentheses foo<(3 > 2)>. But I don't think there that many cases where you need to do this and, at any rate, I'd rather have to type the extra parentheses when they are needed, instead of introducing new syntax and always having to type the exclamation mark.

What other problems are there with angle brackets that made the D developers create a new syntax?

FWIW, Java and C# might not have the same issues as their generics are limited to types. — Daniel James, Sep 05 '11 at 07:06
Just to clarify: A C++03 compiler *must* parse the `>>` in `vector>` as a shift operator. That has, fortunately, been changed in a completely incompatible way in C++0x. — Christopher Creutzig, Sep 05 '11 at 07:07
@phresnel: Are you absolutely sure? Do you have a link to some complete analysis? (And what does “fully backward compatible” mean, anyway? The best approximation I could think of would be “does not change the meaning of anything that used to parse correctly”, and that is not the case. `binary>1>` works in C++03 and has a parse error in C++0x.) — Christopher Creutzig, Sep 06 '11 at 07:26
@Paul: Incompatible in that the `>>` in this place used to definitely be a shift operator and now definitely is not. The only resulting incompatibilities I currently know are changes in what character sequences make up correctly formed C++ code. I believe that the code that is no longer working was never all that useful and that the things that were not working in the past are things that really should work, so I'm happy. There might be subtle changes to program meaning in artificially constructed cases, I don't know for sure. — Christopher Creutzig, Sep 06 '11 at 07:29
@Christopher: I think we were both walking on extremes a tad too much. You wrote "completely incompatible", I wrote "fully compatible", but as you point out, we are both wrong :D — Sebastian Mach, Sep 06 '11 at 07:37

score 27 · Answer 1 · answered Sep 05 '11 at 07:00

27

Personally, the most hideous problem I have seen is the invocation of template functions in dependent context:

template <typename T>
void foo(T t) {
  t.bar<3>();
}

This looks admittedly simple, but in fact is incorrect. The C++ Standard requires the introduction of the template keyword to disambiguate t.bar < 3 vs a method invocation yielding:

t.template bar<3>(); // iirk

litb made some very interesting posts regarding the possible interpretation a compiler could come up with.

Regarding the >> issue, it's fixed in C++0x, but requires more clever compilers.

answered Sep 05 '11 at 07:00

Matthieu M.

287,565
48
449
722

2

Yikes! I never knew about this pitfall - and I don't think I saw the `o.template f` syntax before, either. Do you happen to have a link to those `very interesting posts` by `litb`? I'm curious. :-) – Frerich Raabe Sep 05 '11 at 07:08
1

@Frerich: there is [this post](http://stackoverflow.com/questions/610245/where-and-why-do-i-have-to-put-template-and-typename-on-dependent-names/613132#613132) for an in-depth explanation of template and typename, but it's not the best I remember... he just posted too many answers :/ – Matthieu M. Sep 05 '11 at 07:28
4

I just came across this post again and was like "OH I didn't know that!!" then I saw my own comment above... – user541686 Jul 29 '12 at 22:23
2

@Mehrdad: Looks like you encountered your nemesis ;) – Matthieu M. Jul 30 '12 at 07:08
@MatthieuM. `t.bar<3>(5);` is IMO more obviously ambiguous: for `struct A { int bar; };`, `foo( A{} )` would contain a `(t.bar < 3) > 5;` [Live example](http://coliru.stacked-crooked.com/a/39cf491e2af9f613) (similarly, `int()` can mean a function type or a value-initialized `int`; `t.bar(5);`) Oh, and for things like `t.bar < 3 > (a == b);` you might actually need those parens. – dyp Jan 31 '14 at 23:31
@dyp: Of course, none of those would ever be ambiguous if in a moment of sanity we just ruled out *ordering* booleans. `false < true` is not really meaningful, and it is just an accident of C (which did not have a boolean type and promoted `<` arguments to `int`). – Matthieu M. Feb 01 '14 at 11:37
@MatthieuM. I agree. But `t.bar < 3` might not yield a `bool` (overloaded operators). Though, I can't imagine an interpretation that is legal and *not* misusing some language feature. One might think that the *default* interpretation should be a template-id. – dyp Feb 01 '14 at 12:25

score 26 · Accepted Answer · answered Sep 05 '11 at 06:59

26

but haven't compilers become able to parse that syntax, nowadays?

Of course. But it’s far from trivial. In particular, it prevents you from implementing a clean separation between context-unaware lexer and parser. This is particularly irksome for syntax highlighters and other support tools that need to parse C++, but don’t want/can implement a fully-fledged syntactical analyser.

It makes C++ so much harder to parse that a lot of tools simply won’t bother. This is a net loss for the ecosystem. Put differently: it makes developing a parsing tool much more expensive.

For instance, ctags fails for some template definitions, which makes it unusable with our current C++ project. Very annoying.

But I don't think there that many cases where you need to [distinguish between angle brackets and less-than]

It doesn’t matter how often you need to do this. Your parser still needs to handle this.

D’s decision to drop angle backets was a no-brainer. Any one reason would have sufficed, given that it’s a net benefit.

answered Sep 05 '11 at 06:59

Konrad Rudolph

530,221
131
937
1,214

2

I wish more tools would integrate the Clang parser. There is a project already to get it to work in `vi`: [clang_complete](http://www.vim.org/scripts/script.php?script_id=3302) – Matthieu M. Sep 05 '11 at 07:04
1

@MatthieuM. The clang parser has only recently reached maturity (and its C++11 support is still sketchy). I foresee that we will have a lot of clang-based tools in the near future. But … not yet. In fact, why don’t we already have tons of GCC-based tools? The answer is their deliberately complex architecture coupled with their stupid licensing policies. GNU has really burnt money there. – Konrad Rudolph Sep 05 '11 at 07:05
2

Definitely young, indeed. Apple plans full C++11 support by the end of the year, if I recall correctly, so 2012 should be Clang's year :) As for gcc's policy... my comment isn't politically correct, I fear. – Matthieu M. Sep 05 '11 at 07:09
I don't think that the `foo!(int)` syntax is going to be very parser-friendly either. `foo{int}` would be a bit cleaner. – MSalters Sep 05 '11 at 14:47
@MSalters Which problem do you foresee? The serious problem is that angle brackets sometimes match (open/close bracket) and sometimes not (less than, greater than). While parentheses must always match. This allows the lexer to treat them completely uniformly. Whether there’s an extra symbol attached to an opening parenthesis isn’t particularly important (although the “overloading” with negation could and should have been avoided, to be sure). – Konrad Rudolph Sep 05 '11 at 14:57
@Konrad: Actually, it's not the `()` but the `!` which made me worried. But I then realized that there's no other grammar production in which it would be followed by `(`, and one-token-lookahead shouldn't be a real problem for C++ parsers ;) – MSalters Sep 05 '11 at 15:10
IIRC you can treat that `!` as a binary operator taking an identifier on the left and a type/value/parentheses-wrapped-list on the right. – BCS Sep 05 '11 at 16:36
I'm accepting this answer because it explains the core reason why the D folks made a new syntax. – Paul Manta Sep 06 '11 at 06:02
@PaulManta: I know this is old, but bonus points for D because you can omit the `()` in modern D for single args: `foo!int`. But yes, `foo!(bar)` was definitely a good idea. D docs do have a rationale themselves: http://digitalmars.com/d/1.0/templates-revisited.html (see "Argument Syntax"). – Tim Čas Aug 17 '14 at 22:31

score 10 · Answer 3 · answered Sep 05 '11 at 09:02

The issue is making the language grammar context-free. When a program is tokenized by the lexer, it uses a technique called maximal munch, which means that it always takes the longest string possible which could designate a token. That means that >> is treated as the right bitshift operator. So, if you have something like vector<pair<int, int>>, the >> on the end is treated as the right bitshift operator instead of part of a template instantiation. For it to treat >> differently in this context, it must be context-sensitive instead of context-free - that is it has to actually care about the context of the tokens being parsed. This complicates the lexer and parser considerably. The more complicated the lexer and parser are, the higher the risk of bugs - and more importantly, the harder it is for tools to implement them, which means fewer tools. When stuff like syntax highlighting in an IDE or code editor becomes complicated to implement, it's a problem.

By using !() - which would result in vector!(pair!(int, int)) for the same declaration - D avoids the context sensitivity issue. D has made a number of such choices in its grammar explicitly with the idea of making it easier for tools to implement lexing or parsing when they need to in order to do what they do. And since there's really no downside to using !() for templates other than the fact that it's a bit alien to programmers who have used templates or generics in other languages which use <>, it's a sound language design choice.

And how often you do or don't use templates which would create ambiguities when using the angle bracket syntax - e.g. vector<pair<int, int>> - isn't really relevant to the language. The tools must implement it regardless. The decision to use !() rather than <> is entirely a matter of simplifying the language for tools, not for the programmer. And while you may or may not particularly like the !() syntax, it's quite easy to use, so it ultimately doesn't cause programmers any problems beyond learning it and the fact that it may go against their personal preference.

Well, C++ isn’t context free anyway so this isn’t the most pressing issue. — Konrad Rudolph, Sep 05 '11 at 12:18
@Konrad, that may be true, but if D has used `<>`, it too would have been context sensitive. IIRC, D is context free. — BCS, Sep 05 '11 at 16:31
On aesthetics side, I would say `to!int("123")` feels better then `to("123")` — Alexander Malakhov, Sep 12 '11 at 07:00
Also, on `... is entirely a matter of simplifying the language for tools`. Wouldn't it also help faster compilation ? IIRC it's also one of the major goals — Alexander Malakhov, Sep 12 '11 at 07:03
If it results in faster compilation, it's because making the grammar context free makes it simpler for the compiler (which is a tool) to compile the code. So, it's really the same thing. — Jonathan M Davis, Sep 12 '11 at 09:34

Daniel James · Answer 4 · 2011-09-05T07:35:19.960

8

In C++ another problem is that the preprocessor doesn't understand angle brackets, so this fails:

#define FOO(X) typename something<X>::type

FOO(std::map<int, int>)

The problem is that the preprocessor thinks FOO is being called with two arguments: std::map<int and int>. This is an example of the wider problem, that it's often ambiguous whether the symbol is an operator or a bracket.

edited Sep 05 '11 at 07:35

answered Sep 05 '11 at 07:02

Daniel James

3,899
22
30

But D doesn't even have macros, so this can't apply to them. – Paul Manta Sep 05 '11 at 07:04
Actually... any use of the comma operator would also fail, etc... This is more a failure of macros. – Matthieu M. Sep 05 '11 at 07:06
@Paul, yep, that's what I meant when I said 'Although in C++'. – Daniel James Sep 05 '11 at 07:07
@Matthieu you can put the comma operator in brackets (as you'd have to with functions), you can't for types (unless you do some clever hacking with function types). – Daniel James Sep 05 '11 at 07:07
1

The recommendation for preprocessor macros was always (even before the new first recommendation of “don't use them” was introduced, i.e., a really long time ago) to put every occurrence of a parameter in parentheses, `#define FOO(X) typename something<(X)>::type`. Without that, even simple things like `#define TWICE(x) 2*x` have funny consequences, as in `TWICE(2+3)==7`. But yes, even with that, the above wouldn't work. – Christopher Creutzig Sep 05 '11 at 07:10
1

@Christopher You can't do that because it isn't an expression, it's a type and it wouldn't solve the problem which is how the preprocessor parses the macro arguments. If you forbid the use of macros you've severely reduced the power of the language, although you might think that's a good thing. – Daniel James Sep 05 '11 at 07:15
@Daniel: I did point out that it wouldn't solve this problem. Using macros, which only care for characters, in C++, causes considerably more pain than gain, in my experience. It's not really a good thing you can't have a method called `isfinite`, e.g. – the C standard says that `isfinite` is a macro. YMMV, of course. – Christopher Creutzig Sep 05 '11 at 07:31
@Christopher Sorry I read it quickly and missed that, but `#define FOO(X) typename something<(X)>::type` is incorrect because you can't use brackets there. In C++ I always follow the convention of using upper case for a macro and wouldn't use one for something that should be an inline function. I mostly use them for generative purposes (i.e. not part of the public API). – Daniel James Sep 05 '11 at 07:38

score 4 · Answer 5 · edited Feb 13 '12 at 18:32

Have fun figuring out what this does:

bool b = A< B>::C == D<E >::F();
bool b = A<B>::C == D<E>::F();

Last time I checked, you could make it parse either way by changing what's in scope.

Using < and > as both matching and non matching tokens is a disaster. As to the !() making the D usage longer: for the common case of having a single argument, the () are optional, e.g. this is legal:

Set!int foo;

score 2 · Answer 6 · edited Jul 29 '12 at 22:21

I believe those were the only cases.

However, it's not so much a user problem as it is an implementer problem. This seemingly trivial difference makes it much harder to build a correct parser for C++ (as compared to D). D was also designed to be implementer-friendly, and as such they tried their best to avoid making ambiguous code possible.

(Side note: I do find the shift-exclamation point combination to be somewhat awkward... one advantage of angle brackets is definitely ease of typing!)

score 1 · Answer 7 · edited May 23 '17 at 12:13

1

>= greater-than or equals ambiguity is another case that wasn't mentioned:

Fails:

template <int>
using A = int;
void f(A<0>=0);

Works:

void f(A<0> =0);

I think this did not change in C++11 like >>.

See this question for more details: Why does the template-id in "A<0>=0" not compile without space because of the greater-or-equal-than operator ">="?

edited May 23 '17 at 12:13

Community

1
1

answered May 09 '16 at 19:02

Ciro Santilli OurBigBook.com

347,512
102
1,199
985

score 0 · Answer 8 · answered Jan 31 '14 at 23:01

Ultimately, what any compiler has to do it translate your semi-English source code- in whatever language- into the real machine code a computer can actually operate on. This is ultimately a series of incredibly complex mathematical TRANSFORMS.

Well, mathematics tells us that the mapping we need for compilation are "onto" or "surjective". All that means is that every legal program CAN be mapped unambiguously to assembly. This is what language keywords and punctuation like ";" exist for, and why every language has them. However, languages like C++ use the same symbols like "{}" and "<>" for multiple things, so the compiler has to add extra steps to produce the overall, onto transform it needs (this is what you're doing in linear algebra when you multiply matrices). That adds to compile times, introduces significant complexity that itself can harbor bugs, and can limit the compiler's ability to optimize the output.

For example, Strousoup could've used '@' for templates argument- it was an unused character that would've been perfect for letting compilers know that "this is, and only ever will be, some kind of template". That is actually a 1-to-1 transform, which is perfect for analytic tools. But he didn't; he used symbols that already mapped to greater-than and less-than. That alone immediately introduces ambiguity, and it only gets worse from there.

It sounds like "D" decided to make the sequence '!()' a special symbol, used only for templates, like my '@' example above. I'm willing to guess that its highly templated code compiles faster and with fewer bugs as a result.

What are all the syntax problems introduced by the usage of angle brackets in C++ templates?

8 Answers8

Linked

Related