32

For example (in C):

int break = 1;
int for = 2;

Why will the compiler have any problems at all in deducing that break and for are variables here?


So, we need keywords because

  • we want the programs to be readable
  • we do not want to over-complicate the job of already complex compilers of today
  • but most importantly, a language is lot more powerful if some 'key'words are reserved for some special actions. Then, the language can think of being useful at a higher level rather than dying in trying to implement a for loop in an unambiguous way.
tshepang
  • 12,111
  • 21
  • 91
  • 136
Lazer
  • 90,700
  • 113
  • 281
  • 364
  • 3
    compiler just don't want to do an effort :) – Arsen Mkrtchyan Mar 16 '10 at 05:53
  • 7
    FYI, there are languages that have no keywords (Lisp and Smalltalk, off the top of my head). I imagine keywords make the language simpler to parse. And I'd bet there are some cases where keywords are required for disambiguity. – Sasha Chedygov Mar 16 '10 at 05:54
  • 1
    @musicfreak: What is car and cdr? – Heath Hunnicutt Mar 16 '10 at 05:57
  • 4
    My other car is a cdr: http://stackoverflow.com/questions/1864795/what-does-my-other-car-is-a-cdr-mean – Alok Singhal Mar 16 '10 at 05:58
  • @musicfreak: I've never used Lisp, but taking a quick look at it... I'd suppose variables in Lisp can't contain parenthesis. =P – Kache Mar 16 '10 at 06:01
  • @Kache4: They can contain many punctuation chars other languages forbid in identifiers, but you have to draw the line *somewhere* in order to lex the source code. –  Mar 16 '10 at 06:25
  • @Kache4: I wouldn't call parentheses keywords. :P – Sasha Chedygov Mar 16 '10 at 06:43
  • Anyway, +1, very good question, especially for those interested in writing their own language. – Sasha Chedygov Mar 16 '10 at 06:44
  • My PARLANSE langauge (www.semanticdesigns.com/Products/PARLANSE) allows any language character in an identifier, but you sometimes have to escape it. So ~(left_paren~) is an identifier with the ( ) escaped. The reason we did this is to allow identifiers that can give names to arbitrary items found in BNF grammars, by generaing such names from whatever the grammar token name is in a regular way. – Ira Baxter Mar 16 '10 at 07:34
  • Your particular example is fine. But, a language has to deal with all possible examples... – comingstorm Mar 16 '10 at 08:22
  • 1
    @musicfreak: The following are keywords in Lisp: and, begin, case, cond, define, delay, do, else, if, lambda, let, letrec, or, quasiquote, quote, set, unquote, unquote-splicing. I thought car and cdr were keywords, but I see that they and cons are not. However, as they are implemented by intrinsic functions, each implementation either reserves car, cdr, cons or perhaps first, rest, concat. – Heath Hunnicutt Mar 16 '10 at 15:20
  • @musicfreak: Scheme also has reserved keywords: define, else, etc. – Heath Hunnicutt Mar 16 '10 at 15:22
  • 1
    @musicfreak: Smalltalk keywords: true, false, nil, self, super and thisContext – Heath Hunnicutt Mar 16 '10 at 16:06
  • @Alok: Thanks but I wasn't truly asking what they are -- I was wrongly implying they are Lisp reserved words. Technically, they are not, but on the other hand, there are reserved keywords in Lisp. – Heath Hunnicutt Mar 16 '10 at 16:07
  • Because "keywords" keep SEOs employed! hahaha. Just google it. – gonzobrains Sep 29 '11 at 17:28

13 Answers13

29

It's not necessary -- Fortran didn't reserve any words, so things like:

if if .eq. then then if = else else then = if endif

are complete legal. This not only makes the language hard for the compiler to parse, but often almost impossible for a person to read or spot errors. for example, consider classic Fortran (say, up through Fortran 77 -- I haven't used it recently, but at least hope they've fixed a few things like this in more recent standards). A Fortran DO loop looks like this:

DO 10 I = 1,10

Without them being side-by-side, you can probably see how you'd miss how this was different:

DO 10 I = 1.10

Unfortunately, the latter isn't a DO loop at all -- it's a simple assignment of the value 1.10 to a variable named DO 10 I (yes, it also allows spaces in a name). Since Fortran also supports implicit (undeclared) variables, this is (or was) all perfectly legal, and some compilers would even accept it without a warning!

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • 1
    http://stackoverflow.com/questions/1995113/strangest-language-feature/2002154#2002154 - I am sure you know this already though :-) – Alok Singhal Mar 16 '10 at 06:12
  • 1
    I didn't know it had been posted here, but it doesn't surprise me. The other thing to keep in mind is that ',' and '.' are right next to each other, so it wasn't even all that rare of a problem (back when I was writing Fortran, you ran into it about once every three months or so -- not as often as mis-counted Hollerith constants, but still often enough that it was something you checked if a loop misbehaved. – Jerry Coffin Mar 16 '10 at 06:26
  • Another good confusing example, say there's a three dimensional array if, and you are doing an arithmetic if: `if(if(1,2,3))1,2,3` – mpez0 Mar 19 '10 at 17:00
  • and let's not forget Python (pre-3.0): `true = false` – BlueRaja - Danny Pflughoeft Jul 14 '10 at 22:57
20

Then what will the computer do when it comes across a statement like:

while(1) {
  ...
  if (condition)
    break;
}

Should it actually break? Or should it treat it as 1;?

The language would become ambiguous in certain cases, or you'd have to create a very smart parser that can infer subtle syntax, and that's just unnecessary extra work.

Kache
  • 15,647
  • 12
  • 51
  • 79
8

They don't. PL/1 famously has no keywords; every "keyword" (BEGIN, DO, ...) can also be used a variable name. But allowing this means you can write really obscure code: IF DO>BEGIN THEN PRINT:=CALL-GOTO; Reserving the "statement keywords" as the language isn't usually a loss if that set of names is modest (as it is in every langauge I've ever seen except PL/1 :-).

APL also famously has no keywords. But it has a set of some 200 amazing iconic symbols in which to write complicated operators. (the "domino" operator [don't ask!] is a square box with a calculator divide sign in the middle) In this case, the langauge designers simply used icons instead of keywords. The consequence is that APL has a reputation of being a "write only" language.

Bottom line: not a requirement, but it tends to make programs a lot more readable if the keywords are reserved identifiers from a small set known to the programmers. (Some langauges has insisted that "keywords" start with a special punctuation character like "." to allow all possible identifiers to be used, but this isn't worth the extra trouble to type or the clutter on the page; its pretty easy to stay away from "identifiers" that match keywords when the keyword set is small).

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • APL started as a mathematical syntax to describe algorithms. The lack of keywords or text operators actually becomes an advantage in some polylingual institutions, such as ESA or CERN. It is easy to get completely confused, though. – mpez0 Mar 19 '10 at 17:02
  • 6
    Good APL programmers don't get confused at all regarding what the operators are or do. They do get completely confused as what an APL statement is trying to accomplish. – Ira Baxter Mar 20 '10 at 03:54
  • Actually, APL does have keywords, or things perceived as keywords by most APL coders, the system functions and variables. Of course, these all start with the special character {quad}, which is not valid in the name of a variable or a defined function. Many dialects also implement control structures with keywords, but these all start with a colon, which is also not a valid name for a variable or function. – David Siegel Oct 11 '16 at 04:25
  • IIRC, APL\360 had many special functions hidden behind . You might consider these as "keywords" but they aren't traditional. I don't know where "modern" APL (let alone its descendent, "J" went with this. Happy to have fond memories and leave it at that. – Ira Baxter Oct 11 '16 at 04:58
6

Since it's tagged C, the original C language was such that by default any variable was defined as type int.

It means that foo; would declare a variable of type int.

Let's say you do break;. So how does the compiler know whether you want to declare a variable named break or use the keyword break?

Bertrand Marron
  • 21,501
  • 8
  • 58
  • 94
4

several reasons:

  • The keywords may seem unambiguous in your samples. But that is not the only place you would use the variable 'break' or the variable 'for'.

  • writing the parser would be much harder and error prone for little gain.

  • using a keyword as a function or procedure name in a library may have undesired, possibly security relevant, side effects.

lexu
  • 8,766
  • 5
  • 45
  • 63
  • The example given is weak, since it does not contain a three element arglist, or for that matter, anything that can be considered any sort of arglist. The semicolons ruin it. Try rewriting your example in terms of `id` or `while`, since they take only a single parenthesized expression – SingleNegationElimination Mar 16 '10 at 06:03
4

As others said, this makes compiler parsing your source code easier. But I would like to say a bit more: it can also make your source code more readable; consider this example:

if (if > 0) then then = 10 end if

The second "if" and the second "then" are variables, while others are not. I think this kind of code is not readable. :)

Mouhong Lin
  • 4,402
  • 4
  • 33
  • 48
  • Who cares about how hard the compiler has to work, except the compiler guy? He gets outvoted on this topic every time, rightly, IMHO; his job is to make everybody's job easier, not the other way around.. This issue is really about usability for coders. – Ira Baxter Mar 17 '10 at 04:53
  • @Ira Baxter: Not just the compiler guy. If you want to write a program analysis tool or refactoring tool or something, you've got to parse the language. Moreover, if you make building parsers expensive, you've got less resources to work on other aspects of the compiler. – David Thornley Mar 19 '10 at 16:41
  • @Thornley: True, you don't want the "front end" to work a lot harder if you can avoid it. However, keywords or not don't change the average cost of parsing; for GLR parsers and virtually every modern language, the parsing cost is linear with a small constant; (I've implemented dozens of front ends http://www.semanticdesigns.com/Products/DMS/FrontEnds.html with a GLR parser and validate this empirically). The analysis step is usually much more costly as it requires a lot more inference, at least if does anything interesting. – Ira Baxter Mar 20 '10 at 03:51
2

The compiler would have problems if you write something like this:

while(*s++);
return(5);

Is that a loop or a call to a function named while? Did you want to return the value 5 from the current function, or did you want to call a function named return?

It often simplifies things if constructs with special meaning simply have special names that can be used to unambiguously refer to them.

sth
  • 222,467
  • 53
  • 283
  • 367
  • Many language that don't reserve keywords handle cases like these by treating these as context-dependent keywords (even C# does this). That is, where the word might be interpreted speically (e.g., as a "while statment") it is treated as such; where not, it isn't a problem . Your while statement above isn't a problem; it clearly can't be a while statement, as the is no body to execute. The return statement example you give is good; it would clearly be ambiguous without a context-keyword rule, which would make it a return statement. – Ira Baxter Mar 16 '10 at 06:07
  • 2
    @Ira: At least in C (as the question is tagged) this is a valid while-statement. That the body of the `while` is empty is no problem, it just won't execute anything there. – sth Mar 16 '10 at 06:15
  • Yes, you're right because of C's null statement syntax. You can still resolve it context-dependence without having real keywords: this is now clearly a "while" loop :-} You'd have to use (*(&while))(*s++); to force a function call. – Ira Baxter Mar 16 '10 at 07:26
2

If we are speaking of C++ - it already has very complicated grammar. Allowing to use keywords as variable names, for example, will make it even more complicated.

Sad Developer
  • 1,258
  • 1
  • 12
  • 16
2

Because we want to keep what little sanity points we've got:

void myfunction(bool) { .. };

funcp while = &myfunction;
while(true); 
Macke
  • 24,812
  • 7
  • 82
  • 118
  • you forgott to define true = false. On a related note "#define while myfunction" works in c – josefx Mar 19 '10 at 15:29
  • define is a different thing from keywords. – Macke Mar 19 '10 at 15:55
  • just thought that keywords don't help with sanity if you can redefine them. If the preprocessor instructions count as part of the language then only those are reserved, every other keyword can be redefined. – josefx Mar 20 '10 at 00:39
0

I guess it look very weird if not impossible to write the parser. E.g

int break = 1;
while (true) {
   // code to change break
   if (!break) break;   // not very readable code.
}
fastcodejava
  • 39,895
  • 28
  • 133
  • 186
0

Depending on the language definition a compiler may or may not need keywords. When it does not know what to do it can try to apply precedence rules or just fail.
An example:

void return(int i){printf("%d",i);}
public int foo(int a)
{
  if(a > 2)return (a+1)*2;
  return a + 3;
}

What happens if a is greater than 2?

  • The language specification may require the compiler to fail
  • The language specification may require the compiler use the return function
  • The language specification may require the compiler to return

You can define a language which dosn't use keywords. You can even define a language which alowes you to replace all symbols (since they are only very short keywords themselfes).
The problem is not the compiler, if your specification is complete and error free it will work. The problem is PEBCAD, programs using this feature of the language will be hard to read as you have to keep track of the symbol definitions.

josefx
  • 15,506
  • 6
  • 38
  • 63
0

FWIW, Tcl doesn't have any reserved words. You can have variables and functions named "if", "break", etc. The interpretation of a token is totally dependent on the context. The same token can represent a command in one context, a variable in another, or a literal string in another.

Bryan Oakley
  • 370,779
  • 53
  • 539
  • 685
0

In many cases, it would be possible for the compiler to interprete keywords as normal identifiers, like in your example:

int break = 1;
int for = 2;

As a matter of fact, I just wrote a compiler for a simple assembly-like toy language which does this, but warns the user in such cases.

But sometimes the syntax is defined in a way that keywords and identifiers are ambiguous:

int break;

while(...)
{
    break; // <-- treat this as expression or statement?
}

And the most obvious reason is that editors will emphasize keywords so that the code is more readable for humans. Allowing keywords to be treated as identifiers would make code highlighting harder, and would also lead to bad readability of your code.

AndiDog
  • 68,631
  • 21
  • 159
  • 205