Preferred approach at the end of the answer
It seems you are looking for look-around mechanism.
For instance if you want to split on whitespace which has no foo
before and no bar
after it your code can look like
split("(?<!foo)\\s(?!bar)")
Update (assuming that there can't be any nested [...]
and they are well formatted for instance all [
are closed with ]
):
Your case seems little more complex. What you can do is accept ,
if
- it doesn't have any
[
or ]
after it,
or if first opening bracket [
after this comma, has no closing bracket ]
between this comma and itself, otherwise it would mean that comma is inside of area like
[ , ] [
^ ^ ^ - first `[` after tested comma
| +---- one `]` between tested comma and first `[` after it
+------ tested comma
So your code can look like
(this is original version, but below is little simplified one)
split(",(?=[^\\]]*(\\[|$))")
This regex is based on idea that commas you don't want to accept are inside [foo,bar]
. But how to determine that we are inside (or outside) such block?
- if character is inside then there will be no
[
character after it, until we find ]
(next [
can appear after found ]
like in case [a,b],[c,d]
comma between a
and b
has no [
until it finds ]
, but there can be some new area [..]
after it which ofcourse starts with [
)
- if character are outside
[...]
area then next after it can appear only non ]
characters, until we find start of [...]
area, or we will read end of string.
Second case is the one you are interested in. So we need to create regex which will accept ,
which has only non ]
after it (it is not inside [...]
) until it finds [
or read end of string (represented by $
)
Such regex can be written as
,
comma
(?=...)
which has after it
[^\\]]*(\\[|$)
[^\\]]*
zero or more non ]
characters (]
need to be escaped as metacharacter)
(\\[|$)
which have [
(it also needs to be escaped in regex) or end of string after it
Little simplified split version
string.split(",(?![^\\[]*\\])");
Which means: split on comma ,
which after it has no (represented by (?!...)
) unclosed ]
(unclosed ]
has no [
between tested comma and itself which can be written as [^\\[]*\\]
)
Preferred approach
To avoid such complex regex don't use split
but Pattern and Matcher classes, which will search for areas like [...]
or non-comma words.
String string = "a,b,[c,d],e";
Pattern p = Pattern.compile("\\[.*?\\]|[^,]+");
Matcher m = p.matcher(string);
while (m.find())
System.out.println(m.group());
Output:
a
b
[c,d]
e