8

I'd like to do a Regex.Split on some separators but I'd like to keep the separators. To give an example of what I'm trying:

"abc[s1]def[s2][s3]ghi" --> "abc", "[s1]", "def", "[s2]", "[s3]", "ghi"

The regular expression I've come up with is new Regex("\\[|\\]|\\]\\["). However, this gives me the following:

"abc[s1]def[s2][s3]ghi" --> "abc", "s1", "def", "s2", "", "s3", "ghi"

The separators have disappeared (which makes sense given my regex). Is there a way to write the regex so that the separators themselves are preserved?

Ronald Wildenberg
  • 31,634
  • 14
  • 90
  • 133

2 Answers2

12

Use zero-length maching lookarounds; you want to split on

(?=\[)|(?<=\])

That is, anywhere where we assert a match of a literal [ ahead, or where we assert a match of literal ] behind.

As a C# string literal, this is

@"(?=\[)|(?<=\])"

See also

Related questions


Example in Java

    System.out.println(java.util.Arrays.toString(
        "abc[s1]def[s2][s3]ghi".split("(?=\\[)|(?<=\\])")
    ));
    // prints "[abc, [s1], def, [s2], [s3], ghi]"

    System.out.println(java.util.Arrays.toString(
        "abc;def;ghi;".split("(?<=;)")
    ));
    // prints "[abc;, def;, ghi;]"

    System.out.println(java.util.Arrays.toString(
        "OhMyGod".split("(?=(?!^)[A-Z])")
    ));
    // prints "[Oh, My, God]"
Community
  • 1
  • 1
polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
1

You could use .Matches instead of .Split, example (http://www.ideone.com/gUjRM):

string x = "abc[s1]def[s2][s3]ghi";
var r = new Regex(@"[^\[]+|\[[^\]]+\]");
var ms = r.Matches(x);
// do stuff with the MatchCollection `ms`.
kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005