1

So i am trying to write grammar for a simple c++ program.

this is how the grammar looks like right now:

PDefs. Program ::= [Def] ;
terminator Def "" ;
comment "//" ;
comment "/*" "*/" ;
comment "#" ;
DFun. Def ::= Type Id "(" [Arg] ")" "{" [Stm] "}" ;
separator Arg "," ;
terminator Stm "" ;
ADecl. Arg ::= Type Id ;
SExp. Stm ::= Exp ";" ;
SDecl. Stm ::= Type Id ";" ;
SDecls. Stm ::= Type Id "," [Id] ";" ;
SInit. Stm ::= Type Id "=" Exp ";" ;
SReturn. Stm ::= "return" Exp ";" ;
SWhile. Stm ::= "while" "(" Exp ")" Stm ;
SBlock. Stm ::= "{" [Stm] "}" ;
SIfElse. Stm ::= "if" "(" Exp ")" Stm "else" Stm ;


EInt. Exp15 ::= Integer ;
EDouble. Exp15 ::= Double ;
ETrue. Exp15 ::= "true" ;
EFalse. Exp15 ::= "false" ;
EId. Exp15 ::= Id ;
EApp. Exp15 ::= Id "(" [Exp] ")" ;
EPIncr. Exp14 ::= Exp15 "++" ;
EPDecr. Exp14 ::= Exp15 "--" ;
EIncr. Exp13 ::= "++" Exp14 ;
EDecr. Exp13 ::= "--" Exp14 ;
ETimes. Exp12 ::= Exp12 "*" Exp13 ;
EDiv. Exp12 ::= Exp12 "/" Exp13 ;
EPlus. Exp11 ::= Exp11 "+" Exp12 ;
EMinus. Exp11 ::= Exp11 "-" Exp12 ;
ELt. Exp9 ::= Exp9 "<" Exp10 ;
EGt. Exp9 ::= Exp9 ">" Exp10 ;
ELtEq. Exp9 ::= Exp9 "<=" Exp10 ;
EGtWq. Exp9 ::= Exp9 ">=" Exp10 ;
EEq. Exp8 ::= Exp8 "==" Exp9 ;
ENEq. Exp8 ::= Exp8 "!=" Exp9 ;
EAnd. Exp4 ::= Exp4 "&&" Exp5 ;
EOr. Exp3 ::= Exp3 "||" Exp4 ;
EAss. Exp2 ::= Exp3 "=" Exp2 ;

coercions Exp 15 ;
separator Exp "," ;
separator Id "," ;

Tbool. Type ::= "bool" ;
Tdouble. Type ::= "double" ;
Tint. Type ::= "int" ;
Tvoid. Type ::= "void" ;

token Id (letter (letter | digit | '_')*) ;

and this is the simple c++ program that needs to be parsed

// a small C++ program
#include <iostream>

int main()
{
    std::cout << "Hello, world!" << std::endl;
    return 0;
}

so when i try to parse it i get the error in line 6 meaning the std::cout line. Since i am new to bnf i do not know how to "think" to solve this. If someone could give an example of how you would go to solve a situation like this would be great.!

Thank you!

  • 2
    You don't have a rule for accepting `<<` tokens. – TartanLlama Jan 30 '15 at 09:52
  • what about (std:: cout) or endln? do i need rules for them to, cause i have not made a rule about them? @TartanLlama –  Jan 30 '15 at 09:55
  • 1
    @TartanLlama : nor for namespace qualifiers – Sander De Dycker Jan 30 '15 at 09:57
  • Looks like you don't have rules for namespaces either. `std::cout` could be lexed like ID(std) SCOPE ID(cout). – TartanLlama Jan 30 '15 at 09:57
  • Ok, so i get that the variable to the left is a name that i can define and choose. what about the (Exp(number)) mean and how would i know which number to choose? –  Jan 30 '15 at 09:58
  • Ok, so where did you get that information from? If a total beginner wants to know how to do that ? @TartanLlama –  Jan 30 '15 at 10:00
  • Why are you writing this grammar? In order to learn how they work? C++ is a beast to parse because it is very context sensitive. I'd recommend starting with an easier language, maybe a subset of C. – TartanLlama Jan 30 '15 at 10:01
  • I know c,c++, objective-c and Java. In programming terms im not a beginner. This is stuff we got from school to do... @TartanLlama –  Jan 30 '15 at 10:06
  • If you want to know how C++ is supposed to be parsed, the official reference is the C++ standard. Here's a browsable version of the C++ 98 grammar if you're interested : http://slps.github.io/zoo/cpp/iso-14882-1998.html - for your problem, have a look at the "nested-name-specifier" rule – Sander De Dycker Jan 30 '15 at 10:09
  • Thank you for the link. When i look at the "nested-name-specifier" rule it does not tell me anything at all. I still do not get how to connect that with the problem i have? Please do provide and example so i can learn from this... @SanderDeDycker –  Jan 30 '15 at 10:15
  • This seems like a very strange exercise. You aren't going to write a (even semi) realistic grammar for C++ for an assignment. Nor do you have the energy to read the C++ standard in detail, to get the necessary background to do that. So at best you are going to write a bad grammar that might read this particular program. If the goal is to make you write and test a grammar I guess this is ... barely ... ok. But it means you can write any bad grammar that works, including the stupid one rule version that says "pgm = ". I rather wonder about your teacher. – Ira Baxter Jan 30 '15 at 12:49
  • Given that you are being asked to write a bad grammar, then a simple procedure is to take whatever grammar you have (this one), and, when you encounter a syntax error, pretty much add a rule to cover that syntax. To the extent that you can distinguish expressions from statements, you can write corresponding rules. Typical expressions are incredibly straightforward to write rules for. Typical statements pretty much one rule per statement variation. – Ira Baxter Jan 30 '15 at 12:57

1 Answers1

1

The line on which it fails, cannot be parsed because you are missing some rules :

  1. You need a rule for parsing qualified ids.
    A qualified id is a special type of identifier, and can (for your purposes) be used in the same situations as an (unqualified) identifier.
    std::cout and std::endl are qualified ids, and a (simplified) rule for them could look something like this :

    <qualified_id> ::= <nested_name_specifier> <unqualified_id>
    <nested_name_specifier> ::= <namespace_name> "::" <nested_name_specifier>?
    

    in which (for your purposes), <unqualified_id> and <namespace_name> can be treated as identifiers.

  2. You need a rule for parsing an expression with the << operator.
    A (simplified) rule for this additional type of expression could look something like this :

    <shift_left_expression> ::= <other_expression>
    <shift_left_expression> ::= <shift_left_expression> "<<" <other_expression>
    

    in which (for your purposes) <other_expression> stands for any other type of expression.

  3. You need a rule for parsing string literals.
    A string literal is a type of literal, and it can be used (for your purposes) as part of an expression, like an identifier.
    "Hello, world!" is a string literal, and a (simplified) rule for them could look something like this :

    <string_literal> ::= "\"" <s_char_sequence>? "\""
    <s_char_sequence> ::= <s_char>
    <s_char_sequence> ::= <s_char_sequence> <s_char>
    

    in which <s_char> is any character that you want to allow inside a string literal (to keep it simple, don't allow the " character in there eg.).

Sander De Dycker
  • 16,053
  • 1
  • 35
  • 40
  • so for example looking at this pdf. http://www.cse.chalmers.se/edu/year/2012/course/DAT150/lectures/plt-book.pdf at page 43 on the table. Would i write as "letter". –  Jan 30 '15 at 11:14
  • @Meruem : your example has a string literal with letters (both lowercase and uppercase), spaces, and special characters (comma and exclamation mark). So, you will want to at least support those. But probably you want to support anything that would (reasonably) appear in written text. – Sander De Dycker Jan 30 '15 at 12:57