64

I've posted this on the D newsgroup some months ago, but for some reason, the answer never really convinced me, so I thought I'd ask it here.


The grammar of D is apparently context-free.

The grammar of C++, however, isn't (even without macros). (Please read this carefully!)

Now granted, I know nothing (officially) about compilers, lexers, and parsers. All I know is from what I've learned on the web.
And here is what (I believe) I have understood regarding context, in not-so-technical lingo:

The grammar of a language is context-free if and only if you can always understand the meaning (though not necessarily the exact behavior) of a given piece of its code without needing to "look" anywhere else.

Or, in even less rigor:

The grammar cannot be context-free if I need I can't tell the type of an expression just by looking at it.

So, for example, C++ fails the context-free test because the meaning of confusing<sizeof(x)>::q < 3 > (2) depends on the value of q.

So far, so good.

Now my question is: Can the same thing be said of D?

In D, hashtables can be created through a Value[Key] declaration, for example

int[string] peoplesAges;   // Maps names to ages

Static arrays can be defined in a similar syntax:

int[3] ages;   // Array of 3 elements

And templates can be used to make them confusing:

template Test1(T...)
{
    alias int[T[0]] Test;
}

template Test2(U...)
{
    alias int[U] Test2;  // LGTM
}

Test1!(5) foo;
Test1!(int) bar;
Test2!(int) baz;  // Guess what? It's invalid code.

This means that I cannot tell the meaning of T[0] or U just by looking at it (i.e. it could be a number, it could be a data type, or it could be a tuple of God-knows-what). I can't even tell if the expression is grammatically valid (since int[U] certainly isn't -- you can't have a hashtable with tuples as keys or values).

Any parsing tree that I attempt to make for Test would fail to make any sense (since it would need to know whether the node contains a data type versus a literal or an identifier) unless it delays the result until the value of T is known (making it context-dependent).

Given this, is D actually context-free, or am I misunderstanding the concept?

Why/why not?


Update:

I just thought I'd comment: It's really interesting to see the answers, since:

  • Some answers claim that C++ and D can't be context-free
  • Some answers claim that C++ and D are both context-free
  • Some answers support the claim that C++ is context-sensitive while D isn't
  • No one has yet claimed that C++ is context-free while D is context-sensitive :-)

I can't tell if I'm learning or getting more confused, but either way, I'm kind of glad I asked this... thanks for taking the time to answer, everyone!

Community
  • 1
  • 1
user541686
  • 205,094
  • 128
  • 528
  • 886
  • So, what this has to do with c++? – BЈовић Aug 08 '11 at 13:16
  • 4
    @VJo: It's comparing D's grammar to that of C++ (context-free vs. context-dependent). I tagged C++ instead of a different language because D's website compares itself to C++. – user541686 Aug 08 '11 at 13:17
  • Ah, reminds me of Formal Languages and Automata Theory in college. –  Aug 08 '11 at 13:17
  • http://stackoverflow.com/questions/898489/what-programming-languages-are-context-free –  Aug 08 '11 at 13:19
  • I don't think int[T] and int[0] is ambiguous — identifiers probably cannot start with digits, and you probably cannot use variables inside there, so it's either type[type (identifier)] or type[size (constant literal)], resolvable with single token lookahead. Whether it makes sense with regard to concrete types is not a parser's job to determine. – Cat Plus Plus Aug 08 '11 at 13:36
  • 5
    Are we talking about templates, or the syntax in general? I think D has both pointers and multiplication, so what is `a * b`? – Bo Persson Aug 08 '11 at 13:43
  • 1
    @Cat Plus Plus: In D it's fully valid for a type to use any compile time evaluatable expressions. – BCS Aug 08 '11 at 13:50
  • While it was mentioned here is the link: http://www.digitalmars.com/d/archives/digitalmars/D/learn/Context-Free_Grammars_What_about_arrays_24572.html – he_the_great Aug 08 '11 at 14:15
  • C++ Templates and Macros are useful, but, makes a compiler much difficult to implement – umlcat Aug 08 '11 at 15:02
  • 2
    @Bo Persson; As a statement `a*b;` and `a*b=exp;` both get parsed as pointer declarations. If you put `a*b` in an r-value position, it becomes a multiplication expressions. While this is not shown in the documented grammar, it is well defined and could be resolved at the grammar level. – BCS Aug 08 '11 at 15:30
  • 3
    @BCS - Ok, so the grammar is context free, you just cannot tell that from the grammar? I can see why this question was asked. :-) – Bo Persson Aug 08 '11 at 15:30
  • 1
    @Bo Persson: you can't tell it from the *documented* grammar. :b – BCS Aug 08 '11 at 16:25
  • 7
    fun how everyone presents his own theory of computer science in here. I'm eager to know who is right. I have no clue at all. I better not guess. – Johannes Schaub - litb Aug 08 '11 at 20:30
  • 4
    We need to get Walter to answer this question. – Arlen Aug 09 '11 at 22:19
  • 1
    The problem here is that the OP's informal definition of context-free and idea if what a grammar is are not strict enough. We can't introduce 'meaning' either. Formal grammars are mathematical concepts that know absolutely nothing about meaning and a programming language definition has a lot more to it than a formal grammar. A grammar looked at one way is a set of rules that let you decide whether or not a long string (the program text) is valid syntactically. If it is valid it still might not be meaningful, might not compile, but if it is not valid then it is definitely crap. – Cecil Ward Mar 07 '17 at 13:11
  • Looked at in the opposite direction, a formal grammar tells you how to _generate_ every syntactically correct program. To do this you take abstract symbols mentioned on the left side of one of the rules in the grammar and you replace it with some expanded thing of your choice that conforms to the expansion rule on the right hand side of the rule, and there is some symbol like an equals sign in between. The replacement is like doing a search & replace in a text editor. The replacement text can be anything you like as long as it follows the pattern which is the rhs of that rule. – Cecil Ward Mar 07 '17 at 13:16
  • If you are familiar with regular expressions in search and replace in your favourite word processor or other editor or in programming languages then you already have seen this. Now you know all there is to know about formal grammars. A context free grammar is one where the rules are only allowed to be of this very simple substitution type, and there are no conditions allowed where you might say ‘substitute this way if the lhs occurs in this syntactic context’ but substitute something else if in another syntactic context. – Cecil Ward Mar 07 '17 at 13:26
  • Example: a context-free rule will always look like X -> Y, meaning "whenever you see an X you can replace it with a Y" in generating a program a non-context-free rule might look something like a X b -> Y but c X d -> Z might tell you how to expand X, that is “what to replace X by”, when generating a valid program and it says that in one context, when X has an a before it and a b after it, then replace it with a Y, but in the second context, replace it with a Z. So when you see an X you have to look around it to know what to do. – Cecil Ward Mar 07 '17 at 13:32
  • Now when you are analysing a program as in compiling, you have to somehow work this in the other direction, to look at your text, see if it looks like some potential rhs in a particular rule, if it matches that rhs you replace your text with the lhs. In a non-context-free grammar you have to worry about the lhs being in the correct context, so processing is more difficult. Context-free grammars are popular because they are easy to implement. – Cecil Ward Mar 07 '17 at 13:34
  • I should have pointed out earlier that the rhs’s could be longer strings. There must always be some rules called ‘terminals’. These say that some lhs equals a single literal string. When generating a program you have finished when you have replaced everything with terminals. When analysing you start by matching things in your program text against each of the terminals and then you do reverse replacements rhs->lhs against rules until you get to the lhs of some 'root' rule, if you like, this lhs could be called eg PROGRAM. Then you are done analysing, and the program is valid. – Cecil Ward Mar 07 '17 at 13:49
  • If at some stage you can't find any rule whose rhs matches your text, then you are dead meat. Your program has a syntax error and is not valid according to the formal grammar. In the case of a non-context-free grammar, the context given in the lhs has to match the situation too. You can have grammars with more and more generality in the sense that they allow more powerful, complex and expressive rules. This is called the Chomsky Hierarchy. Increasing levels of nastiness in terms of the challenge of 'parsing' (doing syntactic analysis), ending up as a total pig for the compiler writer. – Cecil Ward Mar 07 '17 at 13:52
  • My apologies, as feel better now. Hope that will be useful to somebody anyway. Wikipedia is your friend. Hope I got most of that correct, have left out a lot of detail. – Cecil Ward Mar 07 '17 at 13:58

7 Answers7

49

Being context free is first a property of generative grammars. It means that what a non-terminal can generate will not depend on the context in which the non-terminal appears (in non context-free generative grammar, the very notion of "string generated by a given non-terminal" is in general difficult to define). This doesn't prevent the same string of symbols to be generated by two non-terminals (so for the same strings of symbols to appear in two different contexts with a different meaning) and has nothing to do with type checking.

It is common to extend the context-free definition from grammars to language by stating that a language is context-free if there is at least one context free grammar describing it.

In practice, no programming language is context-free because things like "a variable must be declared before it is used" can't be checked by a context-free grammar (they can be checked by some other kinds of grammars). This isn't bad, in practice the rules to be checked are divided in two: those you want to check with the grammar and those you check in a semantic pass (and this division also allows for better error reporting and recovery, so you sometimes want to accept more in the grammar than what would be possible in order to give your users better diagnostics).

What people mean by stating that C++ isn't context-free is that doing this division isn't possible in a convenient way (with convenient including as criteria "follows nearly the official language description" and "my parser generator tool support that kind of division"; allowing the grammar to be ambiguous and the ambiguity to be resolved by the semantic check is an relatively easy way to do the cut for C++ and follow quite will the C++ standard, but it is inconvenient when you are relying on tools which don't allow ambiguous grammars, when you have such tools, it is convenient).

I don't know enough about D to know if there is or not a convenient cut of the language rules in a context-free grammar with semantic checks, but what you show is far from proving the case there isn't.

Armen Michaeli
  • 8,625
  • 8
  • 58
  • 95
AProgrammer
  • 51,233
  • 8
  • 91
  • 143
  • 5
    +1 for the point that the dividing like between the pure syntax checks and the semantic check being a somewhat arbitrary choice. – BCS Aug 08 '11 at 15:40
  • I'm sorry but I don't know what "terminal" and "generative" mean. But either way, I'm *really* confused by your statement: *"In practice, no programming language is context-free because things like 'a variable must be declared before it is used' can't be checked by a context-free grammar."* Isn't that a semantic issue (unresolved reference) vs. a syntactic issue ("I can't figure out that this is trying to multiply two things, even though one of them doens't exist")? – user541686 Aug 08 '11 at 15:43
  • 11
    Grammars are a way of describing set of strings of terminals (the set of terminals is the alphabet). Generative grammars are the kind of grammar you are familiar with. One of my points was that the division between lexical, syntactic and semantic leave a large part to the arbitrary and is mostly driven by ease of description and availability of tools. And my second main point was that context-free has a formal definition for grammar and without stating precisely what you are ready to push in the lexical and semantic phase, you are in a "you know what I mean" realm. – AProgrammer Aug 08 '11 at 16:01
  • Not the kind of answer I was looking for, but looks like the best one so far, +1. – user541686 Aug 08 '11 at 20:52
  • 4
    While this answer is correct, some may find it confusing because AProgrammer uses the language of Chomsky, where a grammar "generates" a string (the string is produced by the grammar) instead of the more natural notion of *parsing*: in a compiler we don't care about generating strings, we only care about the reverse process of interpreting the string. – Qwertie Jun 08 '12 at 20:40
23

The property of being context free is a very formal concept; you can find a definition here. Note that it applies to grammars: a language is said to be context free if there is at least one context free grammar that recognizes it. Note that there may be other grammars, possibly non context free, that recognize the same language.

Basically what it means is that the definition of a language element cannot change according to which elements surround it. By language elements I mean concepts like expressions and identifiers and not specific instances of these concepts inside programs, like a + b or count.

Let's try and build a concrete example. Consider this simple COBOL statement:

   01 my-field PICTURE 9.9 VALUE 9.9.

Here I'm defining a field, i.e. a variable, which is dimensioned to hold one integral digit, the decimal point, and one decimal digit, with initial value 9.9 . A very incomplete grammar for this could be:

field-declaration ::= level-number identifier 'PICTURE' expression 'VALUE' expression '.'
expression ::= digit+ ( '.' digit+ )

Unfortunately the valid expressions that can follow PICTURE are not the same valid expressions that can follow VALUE. I could rewrite the second production in my grammar as follows:

'PICTURE' expression ::= digit+ ( '.' digit+ ) | 'A'+ | 'X'+
'VALUE' expression ::= digit+ ( '.' digit+ )

This would make my grammar context-sensitive, because expression would be a different thing according to whether it was found after 'PICTURE' or after 'VALUE'. However, as it has been pointed out, this doesn't say anything about the underlying language. A better alternative would be:

field-declaration ::= level-number identifier 'PICTURE' format 'VALUE' expression '.'
format ::= digit+ ( '.' digit+ ) | 'A'+ | 'X'+
expression ::= digit+ ( '.' digit+ )

which is context-free.

As you can see this is very different from your understanding. Consider:

a = b + c;

There is very little you can say about this statement without looking up the declarations of a,b and c, in any of the languages for which this is a valid statement, however this by itself doesn't imply that any of those languages is not context free. Probably what is confusing you is the fact that context freedom is different from ambiguity. This a simplified version of your C++ example:

a < b > (c)

This is ambiguous in that by looking at it alone you cannot tell whether this is a function template call or a boolean expression. The previous example on the other hand is not ambiguous; From the point of view of grammars it can only be interpreted as:

identifier assignment identifier binary-operator identifier semi-colon

In some cases you can resolve ambiguities by introducing context sensitivity at the grammar level. I don't think this is the case with the ambiguous example above: in this case you cannot eliminate the ambiguity without knowing whether a is a template or not. Note that when such information is not available, for instance when it depends on a specific template specialization, the language provides ways to resolve ambiguities: that is why you sometimes have to use typename to refer to certain types within templates or to use template when you call member function templates.

Nicola Musatti
  • 17,834
  • 2
  • 46
  • 55
  • 1
    I don't see what in your cobol code isn't context free (what follow PICTURE is a picture clause, what follow VALUE is an arithmetic expression, but the relationship isn't long distance enough not to be possible in a context-free grammar). `a(c)` on the other hand as to have a different parse tree depending on the declaration of a, that's the kind of thing painful when possible to put in a CFG. – AProgrammer Aug 08 '11 at 14:54
  • 2
    Your COBOL example is wrong. This is perfectly legal in a context-free grammar. Just because a syntactic element can have multiple interpretations, that does not mean the language is context-sensitive. Look up the definition or read some of the other answers here for more details. – hammar Aug 08 '11 at 15:03
  • `This is ambiguous [...] however it isn't a proof that C++ is not context free` ... you kind of left me hanging in your conclusion. :\ Also, I don't really get what you're trying to say, when you say `a program construct cannot have a different meaning according to where it appears` --> Doesn't my use of `int[T[0]]` have a different meaning (array vs. map) depending on where it's used? They're defined differently in D's syntax, aren't they? – user541686 Aug 08 '11 at 15:08
  • 1
    @Mehrdad: Meaning (aka. semantics) has nothing to do with the context-sensitivity of a language, which is a purely syntactic concept. – hammar Aug 08 '11 at 15:19
  • @hammar: But this *does* seem to be a syntax issue. Take a look at `BasicType2` in the [Declaration section of D's grammar](http://www.digitalmars.com/d/2.0/declaration.html#BasicType2): it states that a `DeclaratorSuffix` can be either a `[ AssignExpression ]` or a `[ Type ]`. When you trace through the possibilities of `AssignExpression` (just click the first item every time... it requires around a dozen clicks), you eventually get to [`PrimaryExpression`](http://www.digitalmars.com/d/2.0/expression.html#PrimaryExpression)-- which has `Identifier`, which is ambiguous with a `Type`... no? – user541686 Aug 08 '11 at 15:31
  • 3
    @Mehrdad: A context-free grammar can indeed be ambiguous, in that a string like `int[T[0]]` can be parsed in multiple ways. You can determine that it is valid syntax without looking at the context, just not what it means. In a context-sensitive language, even the validity of such a syntactic element depends on its context. – hammar Aug 08 '11 at 15:36
  • @hammar: Could you provide an example in C++, where an expression becomes syntactically invalid because of context? I think that would help a lot. – user541686 Aug 08 '11 at 15:38
  • @Mehrdad: Such an example would demonstrate that C++ is context-sensitive, which I don't think it is. I do not know of any such examples. – hammar Aug 08 '11 at 15:45
  • @hammar: How about http://stackoverflow.com/q/1172939/541686 ? BTW it's funny how you say that *both* C++ and D are context-free, whereas AProgrammer says *no* language is context-free... – user541686 Aug 08 '11 at 15:47
  • @Mehrdad let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/2233/discussion-between-hammar-and-mehrdad) – hammar Aug 08 '11 at 15:50
  • @hammar: lol I'm actually at work, I can't really chat right now unfortunately. :\ – user541686 Aug 08 '11 at 16:11
  • By the way, I tend to agree with the down-voters' comments :-( . I'll try and amend my answer later on. – Nicola Musatti Aug 08 '11 at 16:27
  • @Mehrdad: That example only shows ambiguity- not validity or invalidity. The *semantics* of what that call just did doesn't change the fact that it's valid C++. – Puppy Aug 08 '11 at 19:17
  • I hope I answered the (valid!) objections expressed in some comments, without taking away what caused some to appreciate my original answer anyway. – Nicola Musatti Aug 08 '11 at 23:49
15

There are already a lot of good answers, but since you are uninformed about grammars, parsers and compilers etc, let me demonstrate this by an example.

First, the concept of grammars are quite intuitive. Imagine a set of rules:

S -> a T
T -> b G t
T -> Y d
b G -> a Y b
Y -> c
Y -> lambda (nothing)

And imagine you start with S. The capital letters are non-terminals and the small letters are terminals. This means that if you get a sentence of all terminals, you can say the grammar generated that sentence as a "word" in the language. Imagine such substitutions with the above grammar (The phrase between *phrase* is the one being replaced):

*S* -> a *T* -> a *b G* t -> a a *Y* b t -> a a b t

So, I could create aabt with this grammar.

Ok, back to main line.

Let us assume a simple language. You have numbers, two types (int and string) and variables. You can do multiplication on integers and addition on strings but not the other way around.

First thing you need, is a lexer. That is usually a regular grammar (or equal to it, a DFA, or equally a regular expression) that matches the program tokens. It is common to express them in regular expressions. In our example:

(I'm making these syntaxes up)

number: [1-9][0-9]*    // One digit from 1 to 9, followed by any number
                       // of digits from 0-9
variable: [a-zA-Z_][a-zA-Z_0-9]*  // You get the idea. First a-z or A-Z or _
                                  // then as many a-z or A-Z or _ or 0-9
                                  // this is similar to C
int: 'i' 'n' 't'
string: 's' 't' 'r' 'i' 'n' 'g'
equal: '='
plus: '+'
multiply: '*'

whitespace: (' ' or '\n' or '\t' or '\r')*   // to ignore this type of token

So, now you got a regular grammar, tokenizing your input, but it understands nothing of the structure.

Then you need a parser. The parser, is usually a context free grammar. A context free grammar means, in the grammar you only have single nonterminals on the left side of grammar rules. In the example in the beginning of this answer, the rule

b G -> a Y b

makes the grammar context-sensitive because on the left you have b G and not just G. What does this mean?

Well, when you write a grammar, each of the nonterminals have a meaning. Let's write a context-free grammar for our example (| means or. As if writing many rules in the same line):

program -> statement program | lambda
statement -> declaration | executable
declaration -> int variable | string variable
executable -> variable equal expression
expression -> integer_type | string_type
integer_type -> variable multiply variable |
                variable multiply number |
                number multiply variable |
                number multiply number
string_type -> variable plus variable

Now this grammar can accept this code:

x = 1*y
int x
string y
z = x+y

Grammatically, this code is correct. So, let's get back to what context-free means. As you can see in the example above, when you expand executable, you generate one statement of the form variable = operand operator operand without any consideration which part of code you are at. Whether the very beginning or middle, whether the variables are defined or not, or whether the types match, you don't know and you don't care.

Next, you need semantics. This is were context-sensitive grammars come into play. First, let me tell you that in reality, no one actually writes a context sensitive grammar (because parsing it is too difficult), but rather bit pieces of code that the parser calls when parsing the input (called action routines. Although this is not the only way). Formally, however, you can define all you need. For example, to make sure you define a variable before using it, instead of this

executable -> variable equal expression

you have to have something like:

declaration some_code executable -> declaration some_code variable equal expression

more complex though, to make sure the variable in declaration matches the one being calculated.

Anyway, I just wanted to give you the idea. So, all these things are context-sensitive:

  • Type checking
  • Number of arguments to function
  • default value to function
  • if member exists in obj in code: obj.member
  • Almost anything that's not like: missing ; or }

I hope you got an idea what are the differences (If you didn't, I'd be more than happy to explain).

So in summary:

  • Lexer uses a regular grammar to tokenize input
  • Parser uses a context-free grammar to make sure the program is in correct structure
  • Semantic analyzer uses a context-sensitive grammar to do type-checking, parameter matching etc etc

It is not necessarily always like that though. This just shows you how each level needs to get more powerful to be able to do more stuff. However, each of the mentioned compiler levels could in fact be more powerful.

For example, one language that I don't remember, used array subscription and function call both with parentheses and therefore it required the parser to go look up the type (context-sensitive related stuff) of the variable and determine which rule (function_call or array_substitution) to take.

If you design a language with lexer that has regular expressions that overlap, then you would need to also look up the context to determine which type of token you are matching.

To get to your question! With the example you mentioned, it is clear that the c++ grammar is not context-free. The language D, I have absolutely no idea, but you should be able to reason about it now. Think of it this way: In a context free grammar, a nonterminal can expand without taking into consideration anything, BUT the structure of the language. Similar to what you said, it expands, without "looking" anywhere else.

A familiar example would be natural languages. For example in English, you say:

sentence -> subject verb object clause
clause -> .... | lambda

Well, sentence and clause are nonterminals here. With this grammar you can create these sentences:

I go there because I want to

or

I jump you that I is air

As you can see, the second one has the correct structure, but is meaningless. As long as a context free grammar is concerned, the meaning doesn't matter. It just expands verb to whatever verb without "looking" at the rest of the sentence.

So if you think D has to at some point check how something was defined elsewhere, just to say the program is structurally correct, then its grammar is not context-free. If you isolate any part of the code and it still can say that it is structurally correct, then it is context-free.

Shahbaz
  • 46,337
  • 19
  • 116
  • 182
  • showing what's the difference between context free grammar and context sensitive is just 3 lines :D. Upvote for the effort in creating a wall of text. "A context sensitive grammar is one that expand a particular combination of terminal/non terminals to another sequence, context free grammar just expand 1 non-terminal to a sequence" – CoffeDeveloper Feb 01 '15 at 17:28
  • 3
    @Dariooo, it could even be 1 line, if you wtite it mathematically, but most people still need explanation to understand. ;) thanks for the upvote – Shahbaz Feb 02 '15 at 08:26
10

There is a construct in D's lexer:

string ::= q" Delim1 Chars newline Delim2 "

where Delim1 and Delim2 are matching identifiers, and Chars does not contain newline Delim2.

This construct is context sensitive, therefore D's lexer grammar is context sensitive.

It's been a few years since I've worked with D's grammar much, so I can't remember all the trouble spots off the top of my head, or even if any of them make D's parser grammar context sensitive, but I believe they do not. From recall, I would say D's grammar is context free, not LL(k) for any k, and it has an obnoxious amount of ambiguity.

Ellery Newcomer
  • 1,594
  • 1
  • 11
  • 23
  • 1
    A fact I remember from language theory is that identifier stuff sameidentifier in general is not context free. Before I posted, I ran the above production rule through the pumping lemma just to be sure. – Ellery Newcomer Aug 16 '11 at 19:09
  • 1
    Voted up because this is the only actual attempt to answer the question. – johncip Jun 05 '13 at 21:59
8

The grammar cannot be context-free if I need I can't tell the type of an expression just by looking at it.

No, that's flat out wrong. The grammar cannot be context-free if you can't tell if it is an expression just by looking at it and the parser's current state (am I in a function, in a namespace, etc).

The type of an expression, however, is a semantic meaning, not syntactic, and the parser and the grammar do not give a penny about types or semantic validity or whether or not you can have tuples as values or keys in hashmaps, or if you defined that identifier before using it.

The grammar doesn't care what it means, or if that makes sense. It only cares about what it is.

Puppy
  • 144,682
  • 38
  • 256
  • 465
  • @Hans: Pretty sure that `context` is only things like "What identifiers have been declared so far?". LALR grammars are context-free, but they still use the parser state to determine meaning. – Puppy Aug 08 '11 at 14:48
  • Nothing prevent a string to be generated by two different non-terminals in a CFG. So just looking at that string won't indicates from which non-terminal it is generated. You need some context. But the context to take into account is more localized in CFG than in more powerful grammar. – AProgrammer Aug 08 '11 at 15:00
  • @DeadMG: `The grammar doesn't care what it means, or if that makes sense. It only cares about what it is.` --> I'm confused. So you mean `omg(!)` is follows D syntax? Everything in that string looks right, except that an exclamation point isn't an identifier or a number... – user541686 Aug 08 '11 at 15:35
  • I wouldn't know enough about D's syntax to know about a given string. – Puppy Aug 08 '11 at 16:32
6

To answer the question of if a programming language is context free you must first decide where to draw the line between syntax and semantics. As an extreme example, it is illegal in C for a program to use the value of some kinds of integers after they have been allowed to overflow. Clearly this can't be checked at compile time, let alone parse time:

void Fn() {
  int i = INT_MAX;
  FnThatMightNotReturn();  // halting problem?
  i++;
  if(Test(i)) printf("Weeee!\n");
}

As a less extreme example that others have pointed out, deceleration before use rules can't be enforced in a context free syntax so if you wish to keep your syntax pass context free, then that must be deferred to the next pass.

As a practical definition, I would start with the question of: Can you correctly and unambiguously determine the parse tree of all correct programs using a context free grammar and, for all incorrect programs (that the language requires be rejected), either reject them as syntactically invalid or produce a parse tree that the later passes can identify as invalid and reject?

Given that the most correct spec for the D syntax is a parser (IIRC an LL parser) I strongly suspect that it is in fact context free by the definition I suggested.


Note: the above says nothing about what grammar the language documentation or a given parser uses, only if a context free grammar exists. Also, the only full documentation on the D language is the source code of the compiler DMD.

BCS
  • 75,627
  • 68
  • 187
  • 294
  • +1 interesting point about syntax vs. semantics. However: "Can you correctly and unambiguously determine the parse tree of all correct programs using a context free grammar" --> Since associative arrays and statically sized arrays aren't part of the same parse tree (they're defined with a different syntax), they require potential *transformations* when used in a template, depending on context. Of course, the compiler compiles everything correctly, but the initial parse tree isn't correct. Is that context-free? Not to me, but I'm not sure. – user541686 Aug 08 '11 at 16:10
  • Concerning my overflow example; I'd have to check but it might even be illegal to allow an overflow even if its result is never used. OTOH that doesn't change the point of the example. – BCS Aug 08 '11 at 16:13
  • @Mehrdad: all that implies is that the document grammar (which is know to be wrong: e.g. `a*b=c;` is ambiguous in that grammar but unambiguously parsed by DMD) is not context free. With some trivial adjustments to the grammar, you can merge the productions so that it is unambiguous. You won't necessarily know know what the thing in the `[]` *is* but you will have correctly parsed it. – BCS Aug 08 '11 at 16:18
  • @BCS In D, `a*b=c;` is a declaration of the variable `b` as a pointer to `a` initialized by the expression `c`. It is so by definition; if `a` doesn't resolve to some type, the program is ill-formed, even if `a * b` as a multiplication expression would return something that could be assigned `c`. For the assigned multiplication expression, you need parentheses. It might be that this is still not context-free, but `a*b=c;` is not a limbo statement. – Quirin F. Schroll Dec 10 '19 at 18:45
  • @Bolpat IIRC when I posted that comment ~9y ago the grammar in the *documentation* technically allowed the `MulExp = Exp;` parsing. With regards to how the compiler interpreted it you are however correct; it is unambiguous. Which was in fact my point; the grammar in the documentation and the grammar in the source code were not the same. – BCS Jan 30 '20 at 19:08
4

These answers are making my head hurt.

First of all, the complications with low level languages and figuring out whether they are context-free or not, is that the language you write in is often processed in many steps.

In C++ (order may be off, but that shouldn't invalidate my point):

  1. it has to process macros and other preprocessor stuffs
  2. it has to interpret templates
  3. it finally interprets your code.

Because the first step can change the context of the second step and the second step can change the context of the third step, the language YOU write in (including all of these steps) is context sensitive.

The reason people will try and defend a language (stating it is context-free) is, because the only exceptions that adds context are the traceable preprocessor statements and template calls. You only have to follow two restricted exceptions to the rules to pretend the language is context-free.

Most languages are context-sensitive overall, but most languages only have these minor exceptions to being context-free.

Sophie McCarrell
  • 2,831
  • 8
  • 31
  • 64
  • I wish I saw this question 3 years ago after taking courses on compilers, etc. I remember understanding this stuff well in school, but without any huge interest in compilers, I'm afraid I haven't studied them in many years. – Sophie McCarrell Aug 08 '11 at 17:35