3

I have been looking for this, but there is a lot of different answers to this question on the MSDN forums.

Some people say that "All computer language grammars are context free" and others say that any language that has white-space sensitive syntax is likely context-sensitive and therefore not context-free (F# and Python).

Whould be nice a definitive answer and maybe some proof.

Tomas Ramirez Sarduy
  • 17,294
  • 8
  • 69
  • 85
  • Literally from Wikipedia: _Every context-free language is context-sensitive._ It's from the article "context-sensitive-language", section "properties of context-sensitive languages". What's going on here? – 11684 Nov 17 '12 at 08:36
  • 2
    @11684, Yes, that's the [Chomsky hierarchy](http://en.wikipedia.org/wiki/Chomsky_hierarchy)... all context-free languages are context-sensitive, but not vice versa ;) – Tomas Ramirez Sarduy Nov 17 '12 at 08:43
  • Then that kind of answers your question, doesn't it? C# is a context-free languages and thus context-sensitive too. – 11684 Nov 17 '12 at 08:48
  • 1
    Those languages (F# and python) are usually described as having *semantic* whitespace not context-sensitive (in the technical sense of grammars). The full python grammar is here and appears to be a BNF: http://docs.python.org/2/reference/grammar.html – Mike Zboray Nov 17 '12 at 08:49
  • @mikez: +1 for the python grammar, interesting... – Tomas Ramirez Sarduy Nov 17 '12 at 08:53
  • 1
    @11684: No, that not answer if **C# is contet-free** and why. I'm not asking if context-free languages are context-sensitive – Tomas Ramirez Sarduy Nov 17 '12 at 08:54
  • @TomSarduy the cited python grammar seem to treat indentation as a symbol (instead of whitespace), it says: `suite: simple_stmt | NEWLINE INDENT stmt+ DEDENT`. This way the grammar does not define how to parse indentation but requires a parser that extracts and interprets whitespace in a special way (I use the word "parser" loosely, what I describe is probably responsibility of the scanner). – Theraot Apr 27 '13 at 05:23

2 Answers2

11

I would describe C# as having a context-free grammar, but the language has context-sensitive rules that are not expressed in the grammar.

From Wikipedia (Formal grammar):

A context-free grammar is a grammar in which the left-hand side of each production rule consists of only a single nonterminal symbol.

From the C# 4.0 specification, section 2.2.1 (Grammar notation):

The lexical and syntactic grammars are presented using grammar productions. Each grammar production defines a non-terminal symbol and the possible expansions of that non-terminal symbol into sequences of non-terminal or terminal symbols.

As I read it, that means that the production rules defining the C# language are context-free. The left-hand side of each production rule is a single non-terminal symbol. By contrast, a context-sensitive grammar may have multiple terminal and non-terminal symbols on the left-hand side of a production rule.

There are, however, many rules in the specification that are context-sensitive. For example, "A local variable must be definitely assigned (§5.3) at each location where its value is obtained." Also, "The signature of a method must be unique in the class in which the method is declared." These are both cases where the validity of any particular fragment depends on the context in which it appears.

Of course, many programming languages have similar requirements, including C. I doubt most would consider C a context-sensitive language. I think this answer summarized it well:

The set of programs that are syntactically correct is context-free for almost all languages. The set of programs that compile is not context-free for almost all languages.

As for Python and F#, as I said in my comment, those languages are usually described as having semantic (or sometimes syntactic) whitespace not context-sensitive.

Community
  • 1
  • 1
Mike Zboray
  • 39,828
  • 3
  • 90
  • 122
0

C# is not context free. Consider the input sequence '>>' in these contexts:

List<List<int>> foo;
int x = 1234 >> 1;

in the first case the lexer returns a separate token for each '>'. It's OK to put white space or even a comment between them. 'List<List /* comment */ >' is a perfectly valid type!

In the second case the lexer returns a single token for the operator '>>' (and also for '>>='). The expression '1234 > > 1' will produce a syntax error.

The lexer needs to know if the context is of a type or an expression, therefore C# is not context free grammar,

  • 1
    The Lexer isn't really aware of the context, in both cases it will produce two `GreaterThanToken`s. The parser then *fabricates* a new token based on the context. See https://github.com/dotnet/roslyn/blob/62646c22f6bd7b213e7e15dbc0dfadfe47a1e30f/src/Compilers/CSharp/Portable/Parser/LanguageParser.cs#L11067-L11073 and https://github.com/dotnet/roslyn/blob/62646c22f6bd7b213e7e15dbc0dfadfe47a1e30f/src/Compilers/CSharp/Portable/Parser/Lexer.cs#L4118-L4122 – Youssef13 Nov 25 '22 at 14:11