1

I have a string which needs to be split based on a delimiter(:). This delimiter can be escaped by a character (say '?'). Basically the delimiter can be preceded by any number of escape character. Consider below example string:

a:b?:c??:d???????:e

Here, after the split, it should give the below list of string:

a 
b?:c?? 
d???????:e

Basically, if the delimiter (:) is preceded by even number of escape characters, it should split. If it is preceded by odd number of escape characters, it should not split. Is there a solution to this with regex? Any help would be greatly appreciated.

Similar question has been asked earlier here, But the answers are not working for this use case.

Update: The solution with the regex: (?:\?.|[^:?])* correctly split the string. However, this also gives few empty strings. If + is given instead of *, even the real empty matches also ignored. (Eg:- a::b gives only a,b)

Shankar
  • 150
  • 5

1 Answers1

2

Scenario 1: No empty matches

You may use

(?:\?.|[^:?])+

Or, following the pattern in the linked answer

(?:\?.|[^:?]++)+

See this regex demo

Details

  • (?: - start of a non-capturing group
    • \?. - a ? (the delimiter) followed with any char
    • | - or
    • [^:?] - any char but the : (your delimiter char) and ? (the escape char)
  • )+ - 1 or more repetitions.

In Java:

String regex = "(?:\\?.|[^:?]++)+";

In case the input contains line breaks, prepend the pattern with (?s) (like (?s)(?:\\?.|[^:?])+) or compile the pattern with Pattern.DOTALL flag.

Scenario 2: Empty matches included

You may add (?<=:)(?=:) alternative to the above pattern to match empty strings between : chars, see this regex demo:

String s = "::a:b?:c??::d???????:e::";
Pattern pattern = Pattern.compile("(?>\\?.|[^:?])+|(?<=:)(?=:)");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
    System.out.println("'" + matcher.group() + "'"); 
} 

Output of the Java demo:

''
'a'
'b?:c??'
''
'd???????:e'
''

Note that if you want to also match empty strings at the start/end of the string, use (?<![^:])(?![^:]) rather than (?<=:)(?=:).

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563