Semantic vs Syntax analyzing in Programming

Question

I am new to compiler construction and I was trying to make a CFG(Context Free Grammar) of Assignment Statement in programming for Syntax Analyzer in Compiler Construction and I want to know whether this illegal statement is a semantic error or a syntax error ?

5=a;

thanks!

Quite often, BNF for an assignment statement would look like `assign := "=" `, where syntax for the `` is a subset of an expression syntax and does not include literals, binary operations, etc. But, yet, it is totally legitimate approach to expect the same expression syntax on both sides, and then check if a destination expression is legal in one of the consequent semantic passes. Do whatever you like, both ways are ok. — SK-logic, Oct 29 '16 at 14:51

Ira Baxter · Accepted Answer · 2016-10-29T15:58:54.113

The distinctions we use between "syntax" and "semantic" seem to me to be largely an accident of how we currently build compilers with "weak parsers".

The purpose of a compiler is to recognize that is has been given a valid program, to diagnose errors in that program where practical, and to compile that code into an executable form.

How it recognizes a valid program is usually accomplished by using a parser which knows something about the syntactic structure of the program (in many cases driven explicitly by a grammar), followed up by a set of "semantic" checks to validate that the provided structures don't violate the constraints as defined by the language reference manual.

As a practical issue, one cannot define a "parser" the checks all the "syntax" constraints: the parsing technology is often (always!) too weak. We settle for parsers that at best check the context-free structural properties of the program (e.g, "parentheses balance"). Everything else we push into "semantic checking" (by virtue of being the only other place according to the definition in the preceding paragraph).

So, one can define a really weak parser that simply reads characters and accepts whatever the character stream is (surely, your program source is made of characters :), and relegates everything else to "semantic checking". Any additional syntax checking that our chosen parser technology might do is just (admittedly very convenient) gravy.

So, yes, you can define a parser that accepts "5=a;" as matching (some of) the syntax constraints, and do a semantic check later that the left hand side is valid.

With most traditional parser generators (or even a hand-rolled recursive descent parser), you can, with some modest effort, usually define a grammar for your language which will reject "5=a;" as a "syntax error". Because this is so easy, we often assume that such checks are actually done by parsing, and so we would normally say this is a "syntax error" regardless of the parsing technology used, even though that is sloppy.

In contrast, "String S=3.7;" is probably accepted by our parser; the type inconsistency is probably done by a semantic check, so we would say that type checking is a semantic error. But this interpretation comes about because most compilers happen to be built in way that this is true.

If you have a sufficiently powerful (Turing-capable) specification system, (e.g, Van Wingaarden Grammars or Meta-S), you can in fact encode what you think of as all the "syntax" and "semantics" in the same formalism, and have that formalism "executed" to validate your source code. If that engine complains, is it a "syntax error" or a "semantic error"? In this case, we no longer have separate "parsing" and "semantic checking" so it gets difficult to say. At best you can say you have an error "parsing the source text as a valid program".

It should be noted that rejecting `5=a` at the syntax level is not possible with an LL(k) parser (unless I'm missing something major) and requires arbitrary backtracking in a recursive descent parser. The official C grammar also does not reject it (`assignment-expression: unary-expression assignment-operator assignment-expression`), so I don't think it's as common as you make it sound. — sepp2k, Oct 30 '16 at 06:27
@sepp2k: For an LL parser, in general surely I can write a grammar rule "assignment_stmt = target '=' expression; " and then write specialized rules to define target. This would be problematic only if the language also allowed "some_stmt = expression;" which only makes sense if you mix assignment operators in the middle of the expression. That is true of C, and many language copy C's style but not necessarily everything about C, and there are oodles of languages that don't copy C's style. So we can argue about how common. The official C grammar likewise could have been written differently — Ira Baxter, Oct 30 '16 at 10:38
Let's say you do that. For an LL grammar, you'd end up with something like `target : primary_target ('[' expression ']')?; primary_target : identifier | '(' target ')'`, right? Now you'll also have `postfix_expression : primary_expression ('[' expression ']' | ...)?` and `primary_expression : identifier;` and `expression_statement : expression ';'` where `expression` can be derived to `postfix_expression` and `statement` to `assignment_statement` and `expression_statement`. Now both `assignment_statement` and `expression_statement` can both start with a prefix of the form ... — sepp2k, Oct 30 '16 at 13:06
... `identifier '[' expression ']'` and you won't be able to know which production to take until you encounter a `=` or `;`. Since the index part can be arbitrarily long, a constant amount of lookahead won't suffice to make this decision. Therefore it's not LL(k) and I'm pretty sure you can't rewrite it a way that is. — sepp2k, Oct 30 '16 at 13:10
You could also try to restrict expression statement to only expressions that could contain side effects, so `arr[i];` would no longer be a valid statement. But you'd still need to allow something like `arr[i]()` or `arr[i].m()`, so you'd still run into the common prefix problem. — sepp2k, Oct 30 '16 at 13:12
Yes, I agree, that was exactly my point in 2nd sentence of my first response to you. But there are many, many languages that insist on keywords as prefixes to assignment statements, or which have keywords starting every statement except assignment statements, and in these cases there is no conflict. (PS: all these problems of LL/recursive descent parsing are why I chose to build our tools around GLR parsers. So many stupid parsing problems *just vanish*. See http://stackoverflow.com/questions/243383/why-cant-c-be-parsed-with-a-lr1-parser/1004737#1004737) — Ira Baxter, Oct 30 '16 at 13:49
For some reason I misread your second sentence as saying "It's only problematic if assignments are also expressions". Sorry for the noise then. — sepp2k, Oct 30 '16 at 14:00
That's ok, this is good reading for those that haven't been down this path. — Ira Baxter, Oct 30 '16 at 15:42

Semantic vs Syntax analyzing in Programming

1 Answers1