3

I know that you saw many questions like mine, but I hope mine is a little bit different. I'm making a translator and I wanted to split a text into sentences but when I've written this code:

public static string[] GetSentences(string Text)
{
    if (Text.Contains(". ") || Text.Contains("? ") || Text.Contains("! "))
        return Text.Split(new string[] { ". ", "? ", "! " }, StringSplitOptions.RemoveEmptyEntries);
    else
        return new string[0];
}

It removed the ".", "?", "!". I want to keep them how can I do it.


NOTE: I want to split by ". " dot and a space, "? " question mark and space...

mc110
  • 2,825
  • 5
  • 20
  • 21
user3260312
  • 241
  • 1
  • 4
  • 9

2 Answers2

16

Simple, replace them first. I'll use the "|" for readability but you may want to use something more exotic.

// this part could be made a little smarter and more flexible.    
// So, just the basic idea:
Text = Text.Replace(". ", ". |").Replace("? ", "? |").Replace("! ", "! |");

if (Text.Contains("|")) 
    return Text.Split('|', StringSplitOptions.RemoveEmptyEntries);

And I wonder about the else return new string[0];, that seems odd. Assuming that when there are no delimiters you want the return the input string, you should just remove the if/else construct.

H H
  • 263,252
  • 30
  • 330
  • 514
2

Regex way:

return Regex.Split(Text, @"(?<=[.?!])\s+");

So you just split the string by empty spaces preceded by one of ., ? and !.

(?<=[.?!])\s+

Regular expression visualization

Demo

Ulugbek Umirov
  • 12,719
  • 3
  • 23
  • 31