0

I have a set of incoming records, that needs to be evaluated under a set of logical clauses defined and stored. An example logical clause be like :

Acct1 != 'Y' AND Acct2 > 1004 AND Acct3 >= 96 AND Acct4 < 1004 AND Acct5 = 99 AND ((Acct6 <= 9090 OR Acct7 IN (A1,A2,A6) AND Acct1 NOT IN (A3,A4)) AND Formatted LIKE 'LINUX' AND Acct9 NOT LIKE 'WINDOWS' AND (Acct10 = 'N' AND NOT Acct11 = 'N') AND EditableField BETWEEN (10 AND 20) )

My data input to the clause be like :

map.put(Acct1,"Y")
map.put(Acct2,1010)
map.put(Acct3,99)
map.put(Acct4,1015)
map.put(Acct5,99)
map.put(Acct6,9090)
map.put(Acct7,"A3")
map.put(Formatted,"LINUX_INST")
map.put(Updated,"LINUX_TMP")
map.put(Acct10,"Y")
map.put(Acct11,"N")
map.put(EditableFIeld,25)

I have to evaluate the incoming records populated into the map onto the clause defined above and print true or false based on the evaluation result.

The clause conditions and map values will be changed and executed as well.

I have the following conditional clauses to be evaluated:

!=
>
>=
<
=
<=
IN(
NOT IN(
LIKE(
NOT LIKE(
BETWEEN(
AND
OR
AND NOT
OR NOT

I have tried using grammar generators but I am told that it is not a recommended solution for our application hence I am looking for java code and I have this detailed example for reference to AND,OR,=. resolving logical operations - AND, OR, looping conditions dynamically and looking for snippets to build on top of that if possible.

Community
  • 1
  • 1
Atom
  • 768
  • 1
  • 15
  • 35
  • What you did and what didn't work? – kosa Sep 22 '15 at 21:00
  • 1
    I have tried using grammar generators as I am not able to code such complex operators in java, but that is not accepted because of technical constraints. Hence looking for snippets if available for evaluating conditions in java. – Atom Sep 22 '15 at 21:04
  • 1
    Do you know _which_ constraints make a parser generator infeasible for your case? Some parser generators generate pure code with no standard library, and are usually tuned to be quite efficient in speed and storage. If parser generators aren't feasible, knowing the reasons may help someone avoid writing a detailed answer that is infeasible for those same reasons. – Jeff Bowman Sep 22 '15 at 21:25
  • @JeffBowman If it is going to be pure java code, then there isn't an issue with it. The issue arises if it derives third party libraries, which aren't recommended for certain apps. – Atom Sep 22 '15 at 21:49
  • JavaCC is one such generator that doesn't rely on a library at runtime, although probably overkill, as is the code in the link you provided. You don't need classes for this, just several levels of recursive-descent parsing functions. Examples abound. – user207421 Sep 22 '15 at 22:23
  • Why is the `'96'` quoted? – user207421 Sep 22 '15 at 22:30
  • @EJP Thanks for your inputs. Please point me towards one such example where several levels of descent parsing functions are used. I shall see if it helps my cause. – Atom Sep 22 '15 at 22:31
  • Your request is off topic. – user207421 Sep 22 '15 at 22:32
  • @EJP my mistake. I have edited '96' to 96. – Atom Sep 22 '15 at 22:33
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/90374/discussion-between-minion-and-ejp). – Atom Sep 22 '15 at 22:38
  • 1
    You can easily code a recursive descent parser that will evaluate (logical) expressions. See http://stackoverflow.com/a/2336769/120163 – Ira Baxter Sep 22 '15 at 23:10

2 Answers2

5

If you want to avoid a parser generator, consider using a StreamTokenizer to implement a recursive descent parser, with one method for each grammar rule.

For a subset of your grammar, this should look roughly like this (and should be straightforward to extend to your full grammar):

public class Parser {

  public static Node parse(String expr) {
    StreamTokenizer tokenizer = 
        new StreamTokenizer(new StringReader(expr));
    tokenizer.nextToken();
    Parser parser = new Parser(tokenizer);
    Node result = parser.parseExpression();
    if (tokenizer.ttype != StreamTokenizer.TT_EOF) {
      throw new RuntimeException("EOF expected, got " 
          + tokenizer.ttype + "/" + tokenizer.sval);
  }

  private StreamTokenizer tokenizer;

  private Parser(StreamTokenizer tokenizer) {
    this.tokenizer = tokenizer;
  } 

  private Node parseExpression() {
    Node left = parseAnd();
    if (tokenizer.ttype == StreamTokenizer.TT_WORD
        && tokenizer.sval.equals("OR")) {
      tokenizer.nextToken();
      return new OperationNode(OperationNode.Type.OR, 
          left, parseExpression());
    }
    return left;
  }

  private Node parseAnd() {
    Node left = parseRelational();
    if (tokenizer.ttype == StreamTokenizer.TT_WORD
        && tokenizer.sval.equals("AND")) {
      tokenizer.nextToken();
      return new OperationNode(OperationNode.Type.AND, 
          left, parseAnd());
    }
    return left;
  }

  private Node parseRelational() {
    Node left = parsePrimary();
    OperationNode.Type type;
    switch (tokenizer.ttype) {
      case '<': type = OperationNode.Type.LESS; break;
      case '=': type = OperationNode.Type.EQUAL; break;
      case '>': type = OperationNode.Type.GREATER; break;
      default:  
        return left;
    }
    tokenizer.nextToken();
    return new OperationNode(type, left, parseRelational());
  }

  private Node parsePrimary() {
    Node result;
    if (tokenizer.ttype == '(') {
      tokenizer.nextToken();
      result = parseExpression();
      if (tokenizer.ttype != ')') {
        throw new RuntimeException(") expected, got "
          + tokenizer.ttype + "/" + tokenizer.sval);
       }
    } else if (tokenizer.ttype == '"' || tokenizer.ttype == '\'') {
      result = new LiteralNode(tokenizer.sval);
    } else if (tokenizer.ttype == TT_NUMBER) {
      result = new LiteralNode(tokenizer.nval);
    } else if (tokenizer.ttype == StreamTokenizer.TT_WORD) {
      result = new FieldNode(tokenizer.sval);
    } else {
      throw new RuntimeException("Unrecognized token: " 
          + tokenizer.ttype + "/" + tokenizer.sval);
    }
    tokenizer.nextToken();
    return result;
  }
}

This assumes a Node object hierarchy like this:

interface Node {
   Object eval(Map<String,Object> data);
}

class FieldNode implements Node {
   private String name; 
   FieldNode(String name) {
     this.name = name;
   }
   public Object eval(Map<String,Object> data) {
     return data.get(name);
   }
}

class LiteralNode implements Node {
   private Object value; 
   FieldNode(Object value) {
     this.value = value;
   }
   public Object eval(Map<String,Object> data) {
     return value;
   }
}

class OperationNode implements Node {
  enum Type {
    AND, OR, LESS, GREATER, EQUALS
  }
  private Type type;
  private Node leftChild;
  private Node rightChild;

  OperationNode(Type type, Node leftChild, Node rightChild) {
    this.type = type;
    this.leftChild = leftChild;
    this.rightChild = rightChild;
  }

  public Object eval(Map<String,Object> data) {
    Object left = leftChild.eval(data);
    Object right = rightChild.eval(data);
    switch (type) {
      case AND: return ((Boolean) left) && ((Boolean) right);
      case OR: return ((Boolean) left) || ((Boolean) right);
      case LESS: return ((Comparable) left).compareTo(right) < 0;
      case EQUALS: return left.equals(right);
      case GREATE: return ((Comparable) left).compareTo(right) > 0;
      default:
        throw new RuntimeException("Invalid op: " + type);
    }
  }    
Stefan Haustein
  • 18,427
  • 3
  • 36
  • 51
3

To directly answer the question, a number of SO questions (e.g. 1, 2) describe the basics of writing a parser by hand, though in practice it is very unusual to write a parser manually outside of university compiler courses due to the boilerplate and exacting detail involved.

As discussed in the comments, it sounds like the main reason to avoid grammar generators is to avoid a dependency on outside libraries. However, when using a grammar generator (parser generator) like JavaCC (Java Compiler-Compiler), there are no JAR files or outside dependencies involved: The JavaCC binary converts a grammar specification into Java code, that can be run without involving any further libraries.

See this IBM tutorial, JoAnn Brereton's "Use JavaCC to build a user friendly boolean query language" (via archive.org) as an example, which incidentally involves a grammar for a search language not unlike yours.

Example inputs:

actor = "Christopher Reeve" and keyword=action and keyword=adventure
(actor = "Christopher Reeve" and keyword=action) or keyword=romance
actor = "Christopher Reeve" and (keyword=action or keyword=romance)

Grammar excerpts:

TOKEN : 
{
<STRING : (["A"-"Z", "0"-"9"])+ >
<QUOTED_STRING: "\"" (~["\""])+ "\"" >
}

void queryTerm() :
{
}
{
        (<TITLE> | <ACTOR> |
         <DIRECTOR> | <KEYWORD>)
        ( <EQUALS> | <NOTEQUAL>)
        ( <STRING> | <QUOTED_STRING> )
        |
       <LPAREN> expression() <RPAREN>
}

Output files:

  • UQLParser.java
  • UQLParserConstants.java
  • UQLParserTokenManager.java
  • TokenMgrError.java
  • ParseException.java
  • Token.java
  • SimpleCharStream.java

This is one of several parser generators that you can consider; others, like yacc and bison, also generate standalone Java files without requiring outside libraries. If necessary, you can check the generated Java files directly into your repository, leaving the .jj compiler source file only if you need to adjust the syntax. (Though it would probably be better to compile freshly from source as part of your build process and avoid checking generated files into source control, this may better suit your constraints of a Java-only solution.)

Jeff Bowman
  • 90,959
  • 16
  • 217
  • 251