3

I have the below code splitting a string on a regex:

string s = "test;3 régred";
string[] authorsList = Regex.Split(s, "(\\s+)|([\\p{P}\\p{S}])");
foreach (string q in authorsList)
{
    Console.WriteLine(q);
}

It's supposed to be splitting and keeping only:

test 3 régred

But it's storing

test ; 3 *space* régred

Why is it not losing the delimiters?

Mong Zhu
  • 23,309
  • 10
  • 44
  • 76
  • 2
    This won't help you, but I suggest that instead of escaping your backslashes with another backslash, you use a literal string, e.g. `@"(\s+)|([\p{P}\p{S}])"`. That way you can shove the RegEx into an online tool like RegExr or Regex101, etc. – Wai Ha Lee May 09 '19 at 14:04
  • 2
    Run it without the capturing groups `()` like `string[] authorsList = Regex.Split(s, "\\s+|[\\p{P}\\p{S}]");` See https://ideone.com/BczkoU – The fourth bird May 09 '19 at 14:05
  • 1
    @WaiHaLee It's a habit I really need to get out of as it's so much less effort than escaping everything –  May 09 '19 at 14:07
  • 1
    @Thefourthbird that did it - perfect –  May 09 '19 at 14:07

1 Answers1

2

You put the delimiters into a capture group by using (...). Remove them and it will work fine:

string[] authorsList = Regex.Split(s, @"\s+|[\p{P}\p{S}]");

Output:

test
3
régred

For reference here is the inverse question

Wai Ha Lee
  • 8,598
  • 83
  • 57
  • 92
Mong Zhu
  • 23,309
  • 10
  • 44
  • 76