4

I’m trying to build a grammar that interprets user-entered text, search-engine style. It will support the AND, OR, NOT and ANDNOT Boolean operators. I have pretty much everything working, but I want to add a rule that two adjacent keywords outside of a quoted string implicitly are treated as in an AND clause. For example:

cheese and crackers = cheese AND crackers

(up and down) or (left and right) = (up AND down) OR (left AND right)

cat dog “potbelly pig” = cat AND dog AND “potbelly pig”

I’m having trouble with the last one, and I’m hoping somebody can point me in the right direction. Here’s my *.g file thus far, and please be nice, my ANTLR experience spans less than a work day:

grammar SearchEngine;

options { language = CSharp2; output = AST; }

@lexer::namespace { Demo.SearchEngine }
@parser::namespace { Demo.SearchEngine }

LPARENTHESIS : '(';
RPARENTHESIS : ')';

AND    : ('A'|'a')('N'|'n')('D'|'d');
OR     : ('O'|'o')('R'|'r');
ANDNOT : ('A'|'a')('N'|'n')('D'|'d')('N'|'n')('O'|'o')('T'|'t');
NOT    : ('N'|'n')('O'|'o')('T'|'t');

fragment CHARACTER : ('a'..'z'|'A'..'Z'|'0'..'9');
fragment QUOTE     : ('"');
fragment SPACE     : (' '|'\n'|'\r'|'\t'|'\u000C');

WS     : (SPACE) { $channel=HIDDEN; };
PHRASE : (QUOTE)(CHARACTER)+((SPACE)+(CHARACTER)+)+(QUOTE);
WORD   : (CHARACTER)+;

startExpression  : andExpression;
andExpression    : andnotExpression (AND^ andnotExpression)*;
andnotExpression : orExpression (ANDNOT^ orExpression)*;
orExpression     : notExpression (OR^ notExpression)*;
notExpression    : (NOT^)? atomicExpression;
atomicExpression : PHRASE | WORD | LPARENTHESIS! andExpression RPARENTHESIS!;
user409108
  • 53
  • 2

1 Answers1

6

Since your AND-rule has the optional AND-keyword, you should create an imaginary AND-token and use a rewrite-rule to "inject" that token in your tree. In this case, you can't make use of ANTLR's short-hand ^ root-operator. You'll have to use the -> rewrite operator.

Your andExpression should look like:

andExpression
  :  (andnotExpression        -> andnotExpression)
     (AND? a=andnotExpression -> ^(AndNode $andExpression $a))* 
  ;

A detailed description of this (perhaps cryptic) notation is given in Chapter 7, section Rewrite Rules in Subrules, page 173-174 of The Definitive ANTLR Reference by Terence Parr.

I ran a quick test to see if the grammar produces the proper AST with the new andExpression rule. After parsing the string cat dog "potbelly and pig" and FOO, the generated parser produced the following AST:

alt text http://img580.imageshack.us/img580/7370/andtree.png

Note that the AndNode and Root are imaginary tokens.

If you want to know how to create the AST picture above, see this thread: Visualizing an AST created with ANTLR (in a .Net environment)

EDIT

When parsing both one two three and (one two) three, the following AST is created:

alt text http://img203.imageshack.us/img203/2558/69551879.png

And when parsing (one two) OR three, the following AST is created:

alt text http://img340.imageshack.us/img340/8779/73390353.png

which seems to be the proper way in all cases.

Community
  • 1
  • 1
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • Thanks! That did the trick. My follow-up concern (now removed, and what your edit was directed at) was due to a bug in some enclosing C# code. – user409108 Aug 03 '10 at 18:31
  • @BartKiers Iam having problems understanding your rewrite rule, even though i have the reference here. Could you please explain why the (..)(..) part is needed and what the dollar operator does? – Th 00 mÄ s Nov 28 '12 at 13:31
  • @ThomAS, I don't think I can word it better than the supreme overlord of ANTLR... :) – Bart Kiers Nov 28 '12 at 19:23
  • @BartKiers can you name this construct so i can look it up? Iam having troubles to pin down the right documentation for parenthesis enclosed terms and the dollar operator. – Th 00 mÄ s Nov 30 '12 at 08:57
  • 1
    @ThomAS, I don't know of an official name for this. I think of it as a "recursive reference" (`$andExpression` being the recursive reference in the rule `andExpression`). – Bart Kiers Nov 30 '12 at 09:27