Boolean Query / Expression to a Concrete syntax tree

Question

I'm creating a search form that allows boolean expressions, like: "foo AND bar" or "foo AND NOT bar".

Is there a library for PHP, Ruby or Java that can transform boolean expressions to a concrete syntax tree?

(I could write my own lexer/parser, but I rather use something tried and tested)

EDIT: To clarify, I'm not parsing arrhythmic expressions. It's going to be used for parsing full text queries that allow boolean operators.

score 2 · Answer 1 · answered Apr 10 '13 at 07:14

I know this question is almost three years old now, but I recently put together a library in Java specifically to manipulate boolean expressions: jbool_expressions.

It includes a tool too parse expressions out of string input:

Expression<String> expr = ExprParser.parse("( ( (! C) | C) & A & B)")

You can also do some fairly simple simplification:

Expression<String> simplified = RuleSet.simplify(expr);
System.out.println(expr);

gives

(A & B)

Hope this helps.

score 2 · Answer 2 · answered Aug 24 '10 at 14:25

For most parser framework, one of the standard introductory grammars is usually a mathematical expression parser. It parses things like 3+4*5-(6/-7), etc. Some tutorials would go even further and show you how to build the syntax tree and evaluate the expression.

A typical boolean expression grammar is usually directly analogous to mathematical expression grammar in this infix notation.

AND and OR are binary operators (i.e. they take 2 operands), just like + and *
NOT is a unary operator (it takes 1 operand), just like the unary -
Some operators have higher precedence than others
You may use (…) to group subexpressions

You should therefore be able to go through the calculator tutorial of most parser package and adapt it to your problem. In fact, in many ways boolean expression parsing is simpler because e.g.:

In mathematical expressions, the - operator is overloaded (may be binary or unary)
Some operator, e.g. exponentiation, is right-associative

In boolean expression grammar, usually the operators aren't overloaded, and everything is left-associative.

Links

antlr.org - Calculator Tutorial

score 2 · Accepted Answer · answered Aug 27 '10 at 15:51

2

I recommend Treetop. It's a minilanguage that generates a parser for your PEG (Parsing Expression Grammar). It's easier to work with than a LALR grammar and more powerful than a recursive descent parser (i.e. it can do some lookaheads more than one symbol).

Although Treetop isn't that complex, it helps to have some simple examples to to start with. You will find them at threedaymonk's treetop examples.

answered Aug 27 '10 at 15:51

Wayne Conrad

103,207
26
155
191

To be honest, the recursive descent parser is the most flexible parser. LL(*) is for this reason a synonym to recursive descent. You can implement any lookahead (that's why there is a star and not a number as the amount of lookahead is variable at any time) just as you wish and need. A PEG implementation can be more efficient due to memoization (which you have to implement by hand for a LL(*) parser) and other optimization techniques. Just my 2 cents. :) – bash0r Feb 01 '16 at 17:08
@bash0r These days I'm using [rattler](https://rubygems.org/gems/rattler), which has a more capable grammar. It'll even handle left recursion. I'm not sure I understand why recursive descent is more flexible--my understanding is that a PEG is a kind of recursive descent compiler. – Wayne Conrad Feb 01 '16 at 17:14
You can switch the algorithm to process data just right in place. You can't do that in PEG without implementing it by hand -> recursive descent. So in fact you are more flexible with handwirtten parsers. I'm not saying that you should do it by hand as it's a bunch of work... – bash0r Feb 02 '16 at 08:30

score 1 · Answer 4 · answered Aug 24 '10 at 13:27

1

just stumbled over this while looking for a sat solver... http://jboolexpr.sourceforge.net/ may be just what you need.

answered Aug 24 '10 at 13:27

tarrasch

2,630
8
37
61

score 0 · Answer 5 · answered Aug 24 '10 at 13:55

0

$patterns=array(
   '/\band\b/i',
   '/\bor\b/i',
   '/\bfoo\b/i',
   '/\bbar\b/i',
   '/\bnot\b/i');
$replace=array(
   '*',  // value for 'and'
   '+',  // value for 'or'
   '1',  // value for 'foo'
   '0',  // value for 'bar'
   '!'); // value for 'not'
$expression="foo AND NOT bar";
$result=eval(preg_replace($pattern, $replace, $expression));

You can easily add your own facts, it will automatically handle brackets and precedence. It might be a good idea to think about how you'll handle unexpected inputs in $expression (hint the replaced version should only contain 1, 0, +, *, (, ) and !)

C.

answered Aug 24 '10 at 13:55

symcbean

47,736
6
59
94

"How you'll handle unexpected inputs in `$expression`" is actually a huge problem with this method. With `foo` and `bar` almost certainly being placeholders for real search terms, it seems like it'd be quite complicated to handle every case. And absent some pretty severe filtering, one could easily inject nearly-arbitrary PHP code via the search term. – cHao Apr 10 '13 at 07:25
No, it's trivial to implement the required filtering - I even told you how to do it: `the replaced version should only contain 1, 0, +, *, (, ) and !` – symcbean Apr 10 '13 at 22:43
Problem is, that won't work if anyone will ever be searching for anything but "foo" or "bar", and probably not even then. Since the point is to parse text queries, you'd have to account for every search term, and turning everything into a `1` or `0` simply isn't going to work for that. Particularly considering that if this is going to...say...be used to generate an SQL query, `1` and `0` will be useless for matching against the text in the DB. – cHao Apr 10 '13 at 23:11

Boolean Query / Expression to a Concrete syntax tree

5 Answers5

Links

Linked