64

What is the main argument in favor of re-using short keywords (and adding context-dependent meanings) instead of just adding more keywords?

Is it just that you want to avoid breaking existing code that may already be using a proposed new keyword, or is there a deeper reason?

The new "enum class" in C++11 got me thinking about this, but this is a general language design question.

Jonathan Wakely
  • 166,810
  • 27
  • 341
  • 521
Andrew Wagner
  • 22,677
  • 21
  • 86
  • 100
  • `auto` and `static` are also cases of that. but if you consider that `override` is now a keyword, lots of code is using this as a plain variable name. AFAIK it builds because the grammar can distinguish where its a keyword and when not. – v.oddou Sep 24 '15 at 09:19
  • 11
    `override` is not a keyword, it's an identifier with special meaning. Keywords, by definition, are always keywords and cannot be used for any other purpose. – Jonathan Wakely Sep 24 '15 at 09:22
  • @JonathanWakely some languages disagree with this strict definition. C# has contextual keywords. They are keywords only in some contexts. For backwards compat sometimes a keyword is "disabled" if there's a colliding type name (e.g. `var`). – usr Sep 24 '15 at 12:44
  • 3
    @usr, right, but the question is tagged [tag:c++11] and v.oddou was talking about the C++ `override` keyw^W identifier very specifically in the context of C++ ("`override` is now a keyword"). – Jonathan Wakely Sep 24 '15 at 12:57
  • @JonathanWakely OK, I though you were speaking about the concept of keywords in general when you said "Keywords, by definition". – usr Sep 24 '15 at 12:59
  • 2
    Nope, I meant the definition in the C++ standard, which says "The identifiers shown in Table 3 are reserved for use as keywords (that is, **they are unconditionally treated as keywords in phase 7**) except in an attribute-token (7.6.1)" (so actually they can be used for other purposes, but only inside attributes. And inside string literals. But apart from that what did the Romans ever do for us?) – Jonathan Wakely Sep 24 '15 at 13:01
  • 2
    I've edited the title to be specific to C++ and voted to reopen. It's possible that SO users have varied opinions about this, but the C++ committee do not. They have a longstanding, well-documented policy about adding new keywords, so a fact-based answer is entirely possible (and has been given!) – Jonathan Wakely Sep 24 '15 at 15:44
  • 1
    Also relevant [Why are override and final identifiers with special meaning instead of reserved keywords?](http://stackoverflow.com/q/30404388/1708801) – Shafik Yaghmour Sep 24 '15 at 19:27

5 Answers5

69

Is it just that you want to avoid breaking existing code that may already be using a proposed new keyword, or is there a deeper reason?

No, that's the reason.

Keywords, by definition, are always considered keywords wherever they occur in the source, so they cannot be used for other purposes. Making something a keyword breaks any code that might be using that token as a variable, function, or type name.

The C committee take a different approach and add new keywords using _Reserved names, e.g. _Atomic, _Bool, and then they add a new header (<stdatomic.h>, <stdbool.h>) with a nicer macro, so that you can choose whether to include the header to get the name atomic or bool, but it won't be declared automatically and won't break code that happens to be using those names already.

The C++ committee don't like macros and want them to be proper keywords, so either re-use existing ones (such as auto) or add context-dependent "keywords" (which are not really keywords, but are "identifiers with special meaning" so they can be used for other things, such as override) or use strange spellings that are unlikely to clash with user code (such as decltype instead of the widely supported typeof extension).

Jonathan Wakely
  • 166,810
  • 27
  • 341
  • 521
  • 13
    Among the reasons the C++ committee don't like macros for this purpose, I think, is that the C idea wouldn't work in C++ because of the way that in C++ any standard header is allowed to include any other. So even if you didn't include `` or whatever it was called, your existing code that used `decltype` for another purpose might break anyway because `` pulls in the macro. Since the committee also doesn't much like macros in general, I doubt there's any appetite to define their way out of that awkwardness by restricting which standard headers can include each other. – Steve Jessop Sep 24 '15 at 13:34
  • Also, sometimes the committee go ahead and add `std::complex::operator""if(float)` to the language. – Simon Richter Sep 24 '15 at 15:20
  • 1
    @SimonRichter, but that can't break anything, because it reuses a keyword, which users couldn't have used for identifiers anyway. And the alternative names suggested for that operator where awful! – Jonathan Wakely Sep 24 '15 at 15:39
  • 2
    wrt to override and final [Why are override and final identifiers with special meaning instead of reserved keywords?](http://stackoverflow.com/q/30404388/1708801) is also relevant. It just adds more details to what you already said. – Shafik Yaghmour Sep 24 '15 at 19:29
  • 5
    It isn't "**just** that you want to avoid breaking existing code". Not breaking the tens of biilions of lines of existing C++ is very important to the C++ standards committee, and many of its millions of users. See – Ian Sep 25 '15 at 14:09
17

Some old languages did not have keywords at all, in particular PL/1 where

IF IF=THEN THEN BEGIN;
  /* some more code */
END;

was a legal piece of code, but completely unreadable. (Look also into APL as an example of write-mostly programming language, which is completely cryptic to read a few months later, even by the code's original author).

The C and C++ language family have a set of keywords defined by the language specification. But there are very widely used languages with billions of legacy source code lines. If you (or their standardization committee) add a new keyword, there is a chance of collisions with some existing program, and as you guessed and others answered this is bad. So if the standard added for instance enum_class as a new keyword, chances are that someone would already have used it as an identifier, and that entity would be unhappy (to have to change their code when adopting a new C++ standard).

Also C++ is widely known to be slowly parsed (in particular, because standard headers like <vector> are pulling dozen of thousand lines of source code, and because modules are not in C++ yet, and because the syntax is strongly ambiguous), so complexifying the parser to handle new syntax is not a big deal (parsing C++ has always been horrible anyway). For example the GCC community is working much harder on new optimizations than on new C++ features (apparently, recent features of the C++ standard library requires much work than parsing new syntax), even if the jump from C++03 to C++11 was a huge jump and required a lot of work in the C++ frontend. This is less true for the C++11 to C++14 jump.

Some other languages (e.g. some dialects of Lisp such as Common Lisp and some Scheme, where you could redefine a let or if macro, and macros in homoiconic languages like these are very different, since operating on ASTs, from the crude textual substitution mechanism in C or C++...) permit the redefinition of existing keywords; read also about hygienic macros. But this can make the source code difficult to understand a few months later.

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • 6
    Fortran fixed-format source code is even more fun for bad typists. `do 100 i = 1, 5` is the start of a loop but `do 100 i = 1. 5` is an assignment statement. Blanks are ignored, so `do 100 I` is the same variable as `do100i`. – alephzero Sep 24 '15 at 19:33
  • 3
    One of the fun parts of PL/1. Only to be done if you were feeling really cruel. Such as IF IF=THEN THEN THEN=ELSE; ELSE ELSE=THEN; . Happy times. – Kickstart Sep 25 '15 at 09:29
  • Rebinding Scheme keywords with `let` or `lambda` isn't too terrible. Where things get dicey is when keywords are recognized by macros. Such macros tend to be fairly ill-behaved, and therefore not often used. – dfeuer Sep 25 '15 at 15:17
  • 1
    At least in Common Lisp, symbols take their meaning from the package in which they're bound, so if some future CL standard adds symbols, it needs only to place them in a different package than the COMMON-LISP (alias CL) package. `(defpackage my-library (:use :cl))` eg, cl21.org redefines much of the language, but each package can decide whether to “use” package CL or CL21, so there is no conflict with existing code. It's a bit as though one had a declaration in every C/C++ file like `using standard c90;` going back 25 years. – BRPocock Oct 01 '15 at 15:31
10

I think it's mainly because adding keywords will break existing code that happens to use this keyword in other contexts, as you suggest.

Stefan Haustein
  • 18,427
  • 3
  • 36
  • 51
10

Is it just that you want to avoid breaking existing code that may already be using a proposed new keyword, or is there a deeper reason?

By definition, a keyword is a special token which cannot be used anywhere else; as a result, introducing a keyword breaks any code that happened to use an identifier with the given spelling.

Some languages use the term contextual keyword to refer to spellings that are only interpreted as keyword in specific contexts. If no "wild" identifier could previously be used in this context, then it is guaranteed that the introduction of the contextual keyword will not break existing code. For example, since no identifier can appear immediately after the closing parenthese in a function signature, this is a place where one can introduce so-called contextual keywords (such as override or final).

On the other hand, in places where any identifier was previously allowed, adding a keyword poses a risk. For example:

  • struct H { my_type f; enum { g }; };: the use of enum class rather than a new keyword is because any new word could be mistakenly taken as the start of a data member declaration in this context; only a keyword is unambiguous (in LL(1)), and introducing a new one could break code.
  • void h() { my_type f; auto x = g(); }: the use of auto rather than a new keyword is because any new word could clash with an existing type. It's a surprising choice still, since it was already a keyword usable in this position in C (defaulting to int type) but its meaning was altered (the justification was the low probability of its usage).

As some have mentioned, languages can be designed without keywords entirely (Haskell comes pretty close), or made in a way than keywords can be introduced seamlessly (for example, if every declaration starts by a keyword already, then introducing a new keyword cannot clash). It just so happens than C and C++ where not made so, and indeed many C-like languages.

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
  • The `auto` keyword could not be used in that context, since a declaration with no explicit type was illegal. `auto int x = 42;` was a valid declaration under the old rules. `auto x = 42;` was a syntax error under the old rules, and equivalent to `int x = 42;` under the new rules. But yes, the similarity of the two contexts could be confusing (if not for the fact that `auto` in the old sense was almost never used.) – Keith Thompson Sep 24 '15 at 17:43
  • @KeithThompson: Ah sorry, the issue comes from [compatibility with C](http://ideone.com/K2bImg) which allows `auto i = 0;`. – Matthieu M. Sep 24 '15 at 19:19
  • 2
    Right -- but only in C90 and earlier. C99 dropped the "implicit `int`" rule, making `auto i = 0;` illegal. What's interesting is that in pre-C99 C, `auto i = 0.0` makes `i` an `int`, while in C++11 and later the same declaration makes `i` a `double`. – Keith Thompson Sep 24 '15 at 19:23
-11

Mistaken enthusiasm of "less is more". It is thought (incorrectly) that by using fewer keywords, programmers would have to learn less and can be more productive sooner. But this only creates confusion about the syntax.

"Real Perl programmers prefer things to be visually distinct." ---- Larry Wall

In other words, use a keyword for one task only.

shawnhcorey
  • 3,545
  • 1
  • 15
  • 17
  • 8
    I don't believe that's the reason. Do you have any evidence that it is, or that anyone on the committee believes that having fewer keywords makes the language easier to learn? As far as I know, the *entire* reason for re-using existing keywords is to avoid breaking existing code.It's acknowledged that things like re-using `static` cause *additional* confusion, but that that's a price worth paying. – Keith Thompson Sep 24 '15 at 17:41
  • @KeithThompson: How can reusing a keyword for a new feature break old code? – shawnhcorey Sep 24 '15 at 22:25
  • It can't; that's the whole point. Inventing a new keyword would break old code that uses that word as an identifier. Reusing an existing keyword avoids that. *That's*, I think, is the motivation for reusing existing keywords, not some "less is more" philosophy. – Keith Thompson Sep 24 '15 at 22:40
  • Not that I agree, but this is the _only_ quoted answer so far. Why all the downvotes? – GOTO 0 Sep 25 '15 at 06:35
  • @GOTO0 perhaps because the quote is not relevant to C++. Some may see a misleading quote as worse than none at all. – trichoplax is on Codidact now Sep 25 '15 at 07:48
  • @trichoplax Yes, I thought of that. Indeed, the [original question](http://stackoverflow.com/revisions/32757571/1) was about language design in general, and it was changed later to make it less broad. – GOTO 0 Sep 25 '15 at 07:56
  • 2
    @GOTO0 Good point. I see that this question was restricted to C++ 2 hours after this answer was posted. Even before the edit I can understand downvotes though - the quote is evidence for the opposite of what the question is asking about, but is presented as evidence that language designers are making a misguided decision. This suggests that there is no good reason for making this design decision, which other answers show is not the case. – trichoplax is on Codidact now Sep 25 '15 at 08:00
  • 1
    @trichoplax: So don't disagree with the herd. Last time I'll ever read stack. – shawnhcorey Sep 25 '15 at 11:30
  • 4
    If you look at the history of the original question, it started as a vague "why do people do things". This was eventually closed, as it was opinion-based/broad (nobody can answer why every language designer does a certain kind of thing). It was then narrowed (around the example) to be about C++. Amusingly, the reason why C++ reuses keywords is both narrow and well documented, so the question was reopened. In the context of the original question, your answer was a poor one (Don't do X is a poor answer to Why X?). In the context of the C++ question, your answer is horrible. Hence, downvotes. – Yakk - Adam Nevraumont Sep 25 '15 at 13:57
  • 1
    As the original question is dead, a good approach is to delete your answer as it no longer applies. A poor approach is to ragequit when people find an answer you post to be a poor one. – Yakk - Adam Nevraumont Sep 25 '15 at 13:58
  • @Yakk thanks for your comment. You say that _the reason why C++ reuses keywords is [...] well documented_, but none of the answers here present any credible evidence for that argument (a link to an authoritatve external source - comparable to Larry Wall - or an official reference would be fine). I might be mistaken, but I believe what other answers show is a very popular but AFAICS unattested opinion. I'd be glad to be proven wrong. – GOTO 0 Sep 25 '15 at 15:27
  • 3
    @GOTO0 Larry Wall is not authoritative on why C++ reused keywords. An opinion by someone that it is a bad idea to reuse keywords is off topic for a question about why languages reuse keywords, even before it was narrowed to C++. I will admit nobody has quoted the minutes of a standardization meeting, or official comments on a proposal, in the other answers. – Yakk - Adam Nevraumont Sep 25 '15 at 15:31