I would like to create a regex so that I can split a string in Java with the following constraints:
Any non-word character, except for:
(a) Characters surrounded by ' '
(b) Any instance of := >= <= <> ..
So that for the following sample string:
print('*'); x := x - 100
I can get the following result in a String[]
:
print
(
'*'
)
;
x
:=
x
-
100
This is the regex I currently have so far:
str.split("\\s+|"+
"(?=[^\\w'][^']*('[^']*'[^']*)*$)|" +
"(?<=[^\\w'])(?=[^']*('[^']*'[^']*)*$)|" +
"(?=('[^']*'[^']*)*$)|" +
"(?<=')(?=[^']*('[^']*'[^']*)*$)");
But this gives me the following result:
print
(
'*'
)
;
x
:
= <!-- This is the problem. Should be above next to the :
x
-
100
UPDATE
I have now learned that it's not possible to achieve this using Regex.
However, I still cannot use any external or frameworks or lexers, and have to use included Java methods, such as StringTokenizer.