21

I have searched the web for my query, but didn't get the answer which fits my requirement exactly. I have my string like below:

A|B|C|The Steading\|Keir Allan\|Braco|E

My Output should look like below:

A
B
C
The Steading|Keir Allan|Braco
E

My requirement is to skip the delimiter if it is preceded by the escape sequence. I have tried the following using negative lookbehinds in String.split():

(?<!\\)\|

But, my problem is the delimiter will be defined by the end user dynamically and it need not be always |. It can be any character on the keyboard (no restrictions). Hence, my doubt is that the above regex might fail for some of the special characters which are not allowed in regex.

I just wanted to know if this is the perfect way to do it.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
user2757740
  • 213
  • 1
  • 2
  • 4

3 Answers3

38

You can use Pattern.quote():

String regex = "(?<!\\\\)" + Pattern.quote(delim);

Using your example:

String delim = "|";
String regex = "(?<!\\\\)" + Pattern.quote(delim);

for (String s : "A|B|C|The Steading\\|Keir Allan\\|Braco|E".split(regex))
    System.out.println(s);
A
B
C
The Steading\|Keir Allan\|Braco
E

You can extend this to use a custom escape sequence as well:

String delim = "|";
String esc = "+";
String regex = "(?<!" + Pattern.quote(esc) + ")" + Pattern.quote(delim);

for (String s : "A|B|C|The Steading+|Keir Allan+|Braco|E".split(regex))
    System.out.println(s);
A
B
C
The Steading+|Keir Allan+|Braco
E
arshajii
  • 127,459
  • 24
  • 238
  • 287
  • Thanks for the Prompt Response. But, i would like to know if all the special characters are allowed in regex. Any exceptions to this? – user2757740 Sep 07 '13 at 20:51
  • @user2757740 Read the linked documentation of `Pattern.quote()`; it takes a string and escapes all special characters. There aren't any exceptions. – arshajii Sep 07 '13 at 20:54
  • That solved most of my question. Thanks a lot.. But, i wanted to implement the same pattern.quote() for my escape sequence as well. But, i m afraid it is not working :( (?<!"+Pattern.quote("\\\\")+")"+Pattern.quote(delim) – user2757740 Sep 07 '13 at 21:28
  • @user2757740 `"\\\\"` is already quoted. Perhaps you meant `Pattern.quote("\\")`. – arshajii Sep 07 '13 at 21:28
  • Consider My escape sequence is '+' and delimiter is "|". (?<!"+Pattern.quote("+")+")"+Pattern.quote("|") is not working... If i provide "+" without Pattern quotes, it fails...as it is dangling literal. But if i provide with Pattern.quotes() it is not splitting the string at all...! – user2757740 Sep 07 '13 at 21:32
  • @user2757740 Yes, it would be `"(?<!" + Pattern.quote("+") + ")" + Pattern.quote("|")`. – arshajii Sep 07 '13 at 21:34
  • @user2757740 I just tried it (using `"A|B|C|The Steading+|Keir Allan+|Braco|E"`) and it worked as expected. – arshajii Sep 07 '13 at 21:35
  • 1
    Thanks bro!.. You made my day and saved my weekend..many thanks for the help...\m/ – user2757740 Sep 07 '13 at 21:48
  • But this doesn't work!? It says "split on `|` as long as it's not preceded by a `\`", but what if that preceding `\` was escaped? For example, if the original string is `a\\|b`, then it _should_ split that into two parts: `a\` and `b`. Sure, if `\` is a reserved character that is never present in the parts, then it works, but why not then simply use `\` as delimiter?! – aioobe Jun 14 '22 at 19:54
5

I know this is an old thread, but the lookbehind solution has an issue, that it doesn't allow escaping of the escape character (the split would not occur on A|B|C|The Steading\\|Keir Allan\|Braco|E)).

The positive matching solution in thread Regex and escaped and unescaped delimiter works better (with modification using Pattern.quote() if the delimiter is dynamic).

Jan Cetkovsky
  • 776
  • 6
  • 4
0
private static void splitString(String str, char escapeCharacter, char delimiter, Consumer<String> resultConsumer) {
    final StringBuilder sb = new StringBuilder();
    boolean isEscaped = false;
    for (int i = 0; i < str.length(); i++) {
        char c = str.charAt(i);
        if (c == escapeCharacter) {
            isEscaped = ! isEscaped;
            sb.append(c);
        } else if (c == delimiter) {
            if (isEscaped) {
                sb.append(c);
                isEscaped = false;
            } else {
                resultConsumer.accept(sb.toString());
                sb.setLength(0);
            }
        } else {
            isEscaped = false;
            sb.append(c);
        }
    }
    resultConsumer.accept(sb.toString());
}
Jonas_Hess
  • 1,874
  • 1
  • 22
  • 32