YACC|BISON :How do I manipulate parse tree?

Question

The goal of my application is to validate an sql code and generate,in the mean time, from that code a formatted one with some modification.
For example this where clause :

where
e.student_name= c.contact_name and ( c.address = " nefta" or c.address=" tozeur ") and
e.age <18

we will have as formatted output something like that :

where e.student_name= c.contact_name
and (c.address=trim("nefta")
or c.address=trim("tozeur") )
and e.age <18

I hope I've explained my aim well

The problem is grammars may contain recursive rules which make the rewrite task unreliable ; for instance in my sql grammar i have this :

search_condition         : search_condition OR search_condition{clbck_or}
                         | search_condition AND search_condition{clbck_and}
                         | NOT search_condition {clbck_not}
                         | '(' search_condition ')'{clbck__}
                         | predicate {clbck_pre}
                         ;

Knowing that I specified a precedence priority to solve shift reduce problems

%left OR
%left AND
%left NOT

So back on the last example ; my clause where will be consumed this way:

c.address="nefta"or c.address="tozeur" -> search_condition
(c.address="nefta"or c.address="tozeur")->search_condition
e.student_name= c.contact_name and (c.address="nefta"or c.address="tozeur")-> search_condition
... and e.age<18-> search_condition

You can by the way understand that it's tough to rebuild the input stream referring to callbacks triggered by each reduction cause the order is not the same.

Any help for this problem ?

What exactly is the problem? Do you want to tranform the query tree after is is complete? Do you want to change the "order" of terms in the clause? — wildplasser, Jan 11 '12 at 16:52
@wildplasser the problem that the order of clbcks are triggered is not the same as the input is written ; i want to rewrite the where clause with the same order without any change — Aymanadou, Jan 11 '12 at 16:54
The ordering in the tree nodes is dictated by the left/right associativity and the precedence rules. When walking or manipulating you can choose to walk preorder / inorder / postorder. And you can rewrite ((a&b)&c) into (a&(b&c), and even ((c&b)&a) The same for OR. (but not for AND and OR combined, of course) — wildplasser, Jan 11 '12 at 17:07

score 2 · Accepted Answer · answered Jan 12 '12 at 04:20

2

Your question is a bit vague, so I'm guessing that you actually print in your clbck_or (), etc. The "common" way to which wildplasser has alluded is to use "semantic values", i. e. (untested):

search_condition         : search_condition OR search_condition{$$ = clbck_or($1, $3);}
                         | search_condition AND search_condition{$$ = clbck_and($1, $3);}
                         | NOT search_condition {$$ = clbck_not($2);}
                         | '(' search_condition ')'{$$ = clbck__($2);}
                         | predicate {$$ = clbck_pre($1);}
                         ;

If you're using Bison, the manual has a fine example in the section "Infix Notation Calculator: `calc'". With strings and C, you will have to add some memory handling.

answered Jan 12 '12 at 04:20

Tim Landscheidt

1,400
1
15
20

If only precedence was used to build the expression tree, he'll have to use precedence to unbuild it. (A+B)*C likely gets tree-encoded as [[A B+] C *] with loss of parentheses; a naive walk in tree order would print A+B*C So he needs to regenerate the parentheses using the same precedence information. [your example appears to keep parentheses in the tree which is unusual for an "abstract syntax tree"; I happen to agree this is a good idea if you want to regenerate source text] – Ira Baxter Jan 12 '12 at 07:34
Why do you want to build a separate tree? I read the question as that the enquirer wants to transform some SQL. – Tim Landscheidt Jan 12 '12 at 07:56
@TimLandscheidt Yes I actually do some prints(concatenate string in a global buffer) in my clbacks, your answer seems simple i will test this rapidly :) thank you – Aymanadou Jan 12 '12 at 08:28
So maybe I misunderstood what was being suggested here; these callbacks aren't building the tree, they're just executing as the parsing runs? If that's the case, any "transformation" that might be done better only use left-to-right information flows. You couldn't use this to rewrite A+B into B+A because when it processes/spits A, it hasn't seen B yet. The reason for building a tree is to allow information in arbitrary parts of the tree to flow to arbitrary other parts. Maybe OP's transformation doesn't need that. But then, he only showed us one example. – Ira Baxter Jan 12 '12 at 08:37

score 1 · Answer 2 · edited May 23 '17 at 11:59

Bison is good at parsing, and with some manual help, good at building a custom syntax tree. After that, its up to you to do what you want with the tree. The good news is you can do whatever you want. The bad news is you still have to build a lot of machinery to do what you want. Your basic problem of regenerating source code is called "prettyprinting"; see my SO answer on how to prettyprint to understand what it takes to do this, including all the peccadillos of lexical syntax (you don't to lose the escapes in your literal strings, right?). You didn't at all address how to find the construct you wanted to change in the tree, or how you'd smash the tree to change it.

If you don't want to do all of that, then what you really want is a program transformation system, which is good at parsing, building a syntax tree for you (so you don't have to think about it, SQL is pretty big grammar), will let you find patterns in the tree in terms of SQL syntax you are used to, make tree changes without knowing much about the shape of the tree, and can finally regenerate valid source text by prettyprint as I describe in my answer link above. (A program transformation systems essentially includes a parser as a subroutine).

Our DMS Software Reengineering Toolkit is such a program transformation system. It has a set of predefined language definitions including SQL2011 and means for configuring for a particular dialect. Using DMS source-to-source syntax rules, you could carry out the change in your example with the following rule:

 domain SQL;

 rule trim_c_members(f: identifier, s: string):condition->condition
 = " c.\f = \s " ->  " c.\f = trim(\s) ";

This is DMS Rule language (meta) syntax to describe a rewrite on ("domain") SQL code. The rule has a name (because in complex application there's lot of rules) and it as syntactic place holders "f" and "s"; it rewrites only conditions in the code. The quotes are RSL meta-quotes; stuff inside is SQL with RSL metavariables "\f" and "\s"; stuff outside is RSL rule syntax. What the rule says is, "for any condition on a variable explicitly named 'c', with any field f, if that field is compared by equality to some literal string, then replace the literal string by 'trim' applied to the literal string".

I left out some code that basically says, "apply this rule to the entire tree, and don't apply it twice in the same place". That "strategy" is one of many built into DMS.

There's the question of how does the rule work. that is accomplished by DMS applying the SQL parser to the meta-quoted strings, to produce "pattern" syntax trees with placeholders where the metavariables are written. The left hand side pattern tree is then matched against the target tree with placeholder referring to subtrees; the right hand tree is spliced in where the left tree matched, and the placeholder subtrees transferred. So, you the programmer see surface sytax that you know and love; the tool works with trees and so it isn't confused by text.

Now, I don't think my rule matches exactly your intent, but that's partly because I can't guess your actual intent. You can write other rules if this isn't what you wanted.

This rule is purely driven by syntax; one can add a semantic predicate (not shown) if you want more complicated conditions to apply to the rule (e.g, the variable has to be ones only in certain scopes you define), and that gets messier to say. But it is much simpler and far easier to read than C code that climbs over the AST (notice you didn't see the AST here?) and tries to figure all this out.

The parsing and prettyprinting happens before and after rule application; there's a lot of machinery required to implement all that, but that machinery is built into DMS (e.g., it has something like [but more powerful] than Bison built in), and for predefined domains such as SQL, all the pretty printing works has been preconfigured, too.

If you want to get a better sense of what it takes to go full cycle with DMS (define your own language parser, define a pretty printer, define complicated rules), here's a nice and complete example of defining and symbolically simplifying calculus using DMS.

Thank you very much, this is really an informative answer.I'll take my time to examine your approach.I appreciate your help. — Aymanadou, Jan 12 '12 at 08:22

YACC|BISON :How do I manipulate parse tree?

2 Answers2