3

There already exist similar questions, but all of them use regexen. The code I'm using (that strips the separators):

string[] sentences = s.Split(new string[] { ". ", "? ", "! ", "... " }, StringSplitOptions.None);

I would like to split a block of text on sentence breaks and keep the sentence terminators. I'd like to avoid using regexen for performance. Is it possible?

Isaac G.
  • 93
  • 1
  • 6

1 Answers1

6

I don't believe there is an existing function that does this. However you can use the following extension method.

public static IEnumerable<string> SplitAndKeepSeparators(this string source, string[] separators) {
  var builder = new Text.StringBuilder();
  foreach (var cur in source) {
    builder.Append(cur);
    if (separators.Contains(cur)) {
      yield return builder.ToString();
      builder.Length = 0;
    }
  }
  if (builder.Length > 0) {
    yield return builder.ToString();
  }
}
JaredPar
  • 733,204
  • 149
  • 1,241
  • 1,454