I learned about flex and bison in school over the summer, and now I want to dive a little deeper. I'm having trouble understanding the documentation for Bison 3.0.2. Maybe some of you could help me out. I want to parse a string representing an equation, and at the same time, fill out a data structure which contains information about what was parsed. For instance, say I have (ax+b)^2. I want the parser to generate a structure containing a string and an integer constant that looks like the following.
( BEGINGROUP
a VARIABLE
x VARIABLE
+ ADDITION
b VARIABLE
) ENDGROUP
I have created a language specification using flex, and I have created a grammar using bison. All that's needed is to get the two to put information into the structures. I have some code which works in kind of the way I want it to, but I can't help but think I'm missing something. In the Bison documentation examples, I see them using $$ or $1 which they say is to check semantic values? When I print the semantic values, I always get zero. Anyway, my code is posted below.
math.l
%{
#include "equation.h"
Equation* equ;
void setEquation(Equation* equation) {
equ = equation;
}
%}
/* Definitions */
space [ \t]
digit [0-9]
letter [a-zA-Z]
number ({digit}+|{digit}+"."{digit}+|"."{digit}+|{digit}+".")
variable {letter}
/* actions */
%%
{space} ;
{number} equ->addElement(yytext, Equation::number); return(1);
{variable} equ->addElement(yytext, Equation::variable); return(2);
"+" equ->addElement(yytext, Equation::addition); return(10); /* Basic operators */
"-" return(11);
"*" return(12);
"/" return(13);
"^" return(14);
"log" return(15);
"sin" return(20); /* Trigonometric Functions */
"cos" return(21);
"tan" return(22);
"csc" return(23);
"sec" return(24);
"cot" return(25);
"arcsin" return(26);
"arccos" return(27);
"arctan" return(28);
"(" equ->addElement(yytext, Equation::begGroup); return(30); /* Grouping Operators */
")" equ->addElement(yytext, Equation::endGroup); return(31);
"[" return(32);
"]" return(33);
"," return(34);
. fprintf(stderr, "Error on character %s\n", yytext);
math.y
/*
* Implement grammer for equations
*/
%{
#include "lex.yy.c"
#include "equation.h"
#include <iostream>
int yylex(void);
int yyerror(const char *msg);
void output(const char* where) {
std::cout << where << ": " << yytext << std::endl;
}
%}
%token e_num 1
%token e_var 2
%token e_plus 10
%token e_minus 11
%token e_mult 12
%token e_div 13
%token e_pow 14
%token e_log 15
%token e_sin 20
%token e_cos 21
%token e_tan 22
%token e_csc 23
%token e_sec 24
%token e_cot 25
%token e_asin 26
%token e_acos 27
%token e_atan 28
%token lparen 30
%token rparen 31
%token slparen 32
%token srparen 33
%token comma 34
%start Expression
%%
Expression : Term MoreTerms
| e_minus Term MoreTerms
;
MoreTerms : /* add a term */
e_plus Term MoreTerms
| /* subtract a term */
e_minus Term MoreTerms
| /* add a negetive term */
e_plus e_minus Term MoreTerms /* Add a negetive term */
| /* minus a negetive term */
e_minus e_minus Term MoreTerms /* Subtract a negetive term */
| /* no extra terms */
;
Term : Factor MoreFactors {equ->addElement("*", Equation::multiplication)};
;
MoreFactors: e_mult Factor MoreFactors
| e_div Factor MoreFactors
| Factor MoreFactors
|
;
Factor : e_num { std::cout << $1 << std::endl; } //returns zero no matter where I put this
| e_var
| Group
| Function
;
BeginGroup : lparen | slparen;
EndGroup : rparen | srparen;
Group : BeginGroup Expression EndGroup
;
Function : TrigFuncs
| PowerFunc
;
TrigFuncs : e_sin lparen Expression rparen
| e_cos lparen Expression rparen
| e_tan lparen Expression rparen
| e_csc lparen Expression rparen
| e_sec lparen Expression rparen
| e_cot lparen Expression rparen
| e_asin lparen Expression rparen
| e_acos lparen Expression rparen
| e_atan lparen Expression rparen
;
PowerFunc : e_num e_pow Factor
| e_var e_pow Factor
| Group e_pow Factor
| TrigFuncs e_pow Factor
;
I think it's pretty clear what I'm doing. As you can see, the scanner stores yytext into the equation class along with its code, but sometimes the parser must add information to the equation class, and this is where things can get hectic. For one, trying to add code before or in the middle of the statement can lead to massive shift/reduce conflicts. Secondly, the effect of putting the code at the end of statement is to record things out of order. Look at the rule for term. If I type "ax", this implicitly means, "a" times "x" or "a*x". I want the parser to add the multiplication into my structure in, but the parser does this out of order. So is there a better way to accomplish my goal?