2

I have Java strings which are boolean expressions with parentheses, &, |, and ! as operators, and I want to split them into tokens. For example:

((!A1)&(B2|C3)) should become "(","(","!","A1",")","&","(","B2","|","C3",")",")"

Following this answer I found that I can use Java's String.split() with a regex that includes lookahead and lookbehind clauses:

List<String> tokens = "((!A1)&(B2|C3))".split("((?<=[!&()|])|(?=[!&()|]))")

My only problem is that whitespace will be included in the list of tokens. For example if I were to write the expression as ( ( !A1 ) & ( B2 | C3 ) ) then my split() would produce at least four strings like " " and there'd be padding around my variables (e.g. " A1 ").

How can I modify this split expression and regex to tokenize the string but not keep any of the witespace?

workerjoe
  • 2,421
  • 1
  • 26
  • 49

1 Answers1

1

Instead of split you can use this this regex to match what you want:

[!&()]|[^!&()\h]+

RegEx Demo

RegEx Details:

  • [!&()]: Match ! or & or ( or )
  • |: OR
  • [^!&()\h]+: Match any characters that is NOT !, &, (, ) and a whitespace

Code:

final String regex = "[!&()]|[^!&()\\h]+";
final String string = "((!A1)&( B2 | C3 ))";

final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);

List<String> result = new ArrayList<>();
while (matcher.find()) {
    result.add(matcher.group(0));
}

System.out.println(result);
anubhava
  • 761,203
  • 64
  • 569
  • 643