Here's the complete solution which does not include third-party libraries like ANTLR or JavaCC. Note that while it's extensible, its capabilities are still limited. If you want to create much more complex expressions, you'd better use grammar generator.
First, let's write a tokenizer which splits the input string to the tokens. Here's the token types:
private static enum TokenType {
WHITESPACE, AND, OR, EQUALS, LEFT_PAREN, RIGHT_PAREN, IDENTIFIER, LITERAL, EOF
}
The token class itself:
private static class Token {
final TokenType type;
final int start; // start position in input (for error reporting)
final String data; // payload
public Token(TokenType type, int start, String data) {
this.type = type;
this.start = start;
this.data = data;
}
@Override
public String toString() {
return type + "[" + data + "]";
}
}
To simplify the tokenization let's create a regexp which reads the next token from the input string:
private static final Pattern TOKENS =
Pattern.compile("(\\s+)|(AND)|(OR)|(=)|(\\()|(\\))|(\\w+)|\'([^\']+)\'");
Note that it has many groups, one group per TokenType
in the same order (first comes WHITESPACE
, then AND
and so on). Finally the tokenizer method:
private static TokenStream tokenize(String input) throws ParseException {
Matcher matcher = TOKENS.matcher(input);
List<Token> tokens = new ArrayList<>();
int offset = 0;
TokenType[] types = TokenType.values();
while (offset != input.length()) {
if (!matcher.find() || matcher.start() != offset) {
throw new ParseException("Unexpected token at " + offset, offset);
}
for (int i = 0; i < types.length; i++) {
if (matcher.group(i + 1) != null) {
if (types[i] != TokenType.WHITESPACE)
tokens.add(new Token(types[i], offset, matcher.group(i + 1)));
break;
}
}
offset = matcher.end();
}
tokens.add(new Token(TokenType.EOF, input.length(), ""));
return new TokenStream(tokens);
}
I'm using java.text.ParseException
. Here we apply the regex Matcher
till the end of the input. If it doesn't match at the current position, we throw an exception. Otherwise we look for found matching group and create a token from it ignoring the WHITESPACE
tokens. Finally we add a EOF
token which indicates the end of the input. The result is returned as special TokenStream
object. Here's the TokenStream
class which will help us to do the parsing:
private static class TokenStream {
final List<Token> tokens;
int offset = 0;
public TokenStream(List<Token> tokens) {
this.tokens = tokens;
}
// consume next token of given type (throw exception if type differs)
public Token consume(TokenType type) throws ParseException {
Token token = tokens.get(offset++);
if (token.type != type) {
throw new ParseException("Unexpected token at " + token.start
+ ": " + token + " (was looking for " + type + ")",
token.start);
}
return token;
}
// consume token of given type (return null and don't advance if type differs)
public Token consumeIf(TokenType type) {
Token token = tokens.get(offset);
if (token.type == type) {
offset++;
return token;
}
return null;
}
@Override
public String toString() {
return tokens.toString();
}
}
So we have a tokenizer, hoorah. You can test it right now using System.out.println(tokenize("Acct1 = 'Y' AND (Acct2 = 'N' OR Acct3 = 'N')"));
Now let's write the parser which will create the tree-like representation of our expression. First the interface Expr
for all the tree nodes:
public interface Expr {
public boolean evaluate(Map<String, String> data);
}
Its only method used to evaluate the expression for given data set and return true if data set matches.
The most basic expression is the EqualsExpr
which is like Acct1 = 'Y'
or 'Y' = Acct1
:
private static class EqualsExpr implements Expr {
private final String identifier, literal;
public EqualsExpr(TokenStream stream) throws ParseException {
Token token = stream.consumeIf(TokenType.IDENTIFIER);
if(token != null) {
this.identifier = token.data;
stream.consume(TokenType.EQUALS);
this.literal = stream.consume(TokenType.LITERAL).data;
} else {
this.literal = stream.consume(TokenType.LITERAL).data;
stream.consume(TokenType.EQUALS);
this.identifier = stream.consume(TokenType.IDENTIFIER).data;
}
}
@Override
public String toString() {
return identifier+"='"+literal+"'";
}
@Override
public boolean evaluate(Map<String, String> data) {
return literal.equals(data.get(identifier));
}
}
The toString()
method is just for information, you can remove it.
Next we will define the SubExpr
class which is either EqualsExpr
or something more complex in parentheses (if we see the parenthesis):
private static class SubExpr implements Expr {
private final Expr child;
public SubExpr(TokenStream stream) throws ParseException {
if(stream.consumeIf(TokenType.LEFT_PAREN) != null) {
child = new OrExpr(stream);
stream.consume(TokenType.RIGHT_PAREN);
} else {
child = new EqualsExpr(stream);
}
}
@Override
public String toString() {
return "("+child+")";
}
@Override
public boolean evaluate(Map<String, String> data) {
return child.evaluate(data);
}
}
Next is AndExpr
which is a set of SubExpr
expressions joined by AND
operator:
private static class AndExpr implements Expr {
private final List<Expr> children = new ArrayList<>();
public AndExpr(TokenStream stream) throws ParseException {
do {
children.add(new SubExpr(stream));
} while(stream.consumeIf(TokenType.AND) != null);
}
@Override
public String toString() {
return children.stream().map(Object::toString).collect(Collectors.joining(" AND "));
}
@Override
public boolean evaluate(Map<String, String> data) {
for(Expr child : children) {
if(!child.evaluate(data))
return false;
}
return true;
}
}
I use Java-8 Stream API in the toString
for brevity. If you cannot use Java-8, you may rewrite it with the for loop or remove toString
completely.
Finally we define OrExpr
which is a set of AndExpr
joined by OR
(usually OR
has lower priority than AND
). It's very similar to AndExpr
:
private static class OrExpr implements Expr {
private final List<Expr> children = new ArrayList<>();
public OrExpr(TokenStream stream) throws ParseException {
do {
children.add(new AndExpr(stream));
} while(stream.consumeIf(TokenType.OR) != null);
}
@Override
public String toString() {
return children.stream().map(Object::toString).collect(Collectors.joining(" OR "));
}
@Override
public boolean evaluate(Map<String, String> data) {
for(Expr child : children) {
if(child.evaluate(data))
return true;
}
return false;
}
}
And the final parse
method:
public static Expr parse(TokenStream stream) throws ParseException {
OrExpr expr = new OrExpr(stream);
stream.consume(TokenType.EOF); // ensure that we parsed the whole input
return expr;
}
So you can parse your expressions to get the Expr
objects, then evaluate them against the rows of your CSV file. I assume that you're capable to parse the CSV row into the Map<String, String>
. Here's usage example:
Map<String, String> data = new HashMap<>();
data.put("Acct1", "Y");
data.put("Acct2", "N");
data.put("Acct3", "Y");
data.put("Acct4", "N");
Expr expr = parse(tokenize("Acct1 = 'Y' AND (Acct2 = 'Y' OR Acct3 = 'Y')"));
System.out.println(expr.evaluate(data)); // true
expr = parse(tokenize("Acct1 = 'N' OR 'Y' = Acct2 AND Acct3 = 'Y'"));
System.out.println(expr.evaluate(data)); // false