I am trying to use java.util.Scanner
to tokenize an arithmetic expression, where the delimiters can either be:
- Whitespace (
\s+
or\p{Space}+
), which should be discarded - Punctation (
\p{Punct}
), which should be returned as tokens
Example
Given this expression:
12 + (ab-bc*3)
I would like Scanner to return these tokens:
12
+
(
ab
-
bc
*
3
)
Code
So far, I have only been able to:
- Eat up all of the punctation characters (not what I wanted):
new Scanner("12 + (ab-bc*3)").useDelimiter("\\p{Space}+|\\p{Punct}").tokens().collect(Collectors.toList())
- Result:
"12", "", "", "", "ab", "bc", "3"
- Achieve partial success using positive lookahead
new Scanner("12 + (ab-bc*3)").useDelimiter("\\p{Space}+|(?=\\p{Punct})").tokens().collect(Collectors.toList())
- Result:
"12", "+", "(ab", "-bc", "*3", ")"
But now I am stuck.