-2

I want to isolate each sentence from a .txt file with punctuation still attached. Is there anyway to do this simply? Below is a gist of what I have so far in C#. (If you use 'regex', could you explain the concept in layman terms?)

string data = System.IO.File.ReadAllText(filePath);

string[] sentences = data.Split(
  new char[] { '.', '!', '?' },
  StringSplitOptions.RemoveEmptyEntries); 

foreach (string s in sentences)
{
    Console.WriteLine(s.Trim(charsToTrim));
}
Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215
  • 1
    What is the desired result for `"?abc!?de.pq!!"`, please? I ask you to have a clear picture on what to do with delimiters (where should I put it and what to do with empty entries) – Dmitry Bychenko Jul 22 '22 at 15:53
  • @DmitryBychenko I'm using delimiters (i.e. '.' , '!', and '?') to separate a .txt file into individual complete sentences. It works, however, it removes the delimiter from the substrings (e.g. "How are you?" becomes "How are you"), and I need them still attached. I don't know what "?abc!?de.oq!!" is, I'm guessing it has to do with patterns and/or 'regex'. Could you elaborate, please? – kendall.tubbs Jul 22 '22 at 16:01
  • "?abc!?de.oq!!" is a (maybe weird) example of input data. You should provide some sample data, and desired output, when asking a question. Just to make sure the language problems on stackoverflow will be avoided if possible because of the nice example you give. – Luuk Jul 22 '22 at 16:03
  • 2
    @kendall.tubbs: there *ambiguities* in the requirements for a new `Split` routine: where should I put delimiter? If I have `"a.b"` should the result be `["a.", "b"]` or `["a", ".", "b"]` or, may be, `["a", ".b"]`? You want to drop empty entries, but what should I do with delimiters then? If I'm given `"a!?b"` string should I provide `["a!", "b"]` as an answer? Or `["a!?", "b"]` or something else? In order *not to flood* you with such questions I made an example string `"?abc!?de.pq!!"` for me to see what your rules are. Your current `Split` returns `["abc", "de", "pq"]`. – Dmitry Bychenko Jul 22 '22 at 16:10

1 Answers1

-1

Use this routine as starting point and adjust it to your requirements:

static string[] SentenceSplitter(string text, string delimiters=".!?")
{
    var sentences = new List<string>();
    var sb = new StringBuilder();

    foreach(char c in text)
    {
        if (delimiters.Contains(c))
        {
            sb.Append(c);
            sentences.Add(sb.ToString());
            sb.Clear();
        }
        else
        {
            sb.Append(c);
        }
    }
    if (sb.Length > 0)
    {
        sentences.Add(sb.ToString());
    }

    return sentences.ToArray();
}

Add error handling to cope with null text, empty sentences and sentences starting with blank.

Axel Kemper
  • 10,544
  • 2
  • 31
  • 54
  • 1
    I am not a downvoter, but I think it's a premature solution: we don't know the rules yet. For instance, what should we return for `"a?!b"` - where should we put saved delimiters, how should we treat empty entries and their delimiters etc. – Dmitry Bychenko Jul 22 '22 at 16:17
  • 1
    I have no doubt that you (having 9k reputation) can easily adopt any policy for any edge cases in this question. I doubt if edge cases are *academic*: if file contains a plain English text we can well face `!!`, `?!`, `...`, `?..` combinations as a possible *punctuation*. I tried myself to help the new user to put such difficult-to-define questions by example(s): having result from mine `"?abc!?de.pq!!"` one can easily derive the rules. – Dmitry Bychenko Jul 22 '22 at 17:41
  • 1
    I'm not a downvoter, but I've named your code "premature" because your solution is good assuming that you have *guessed the rules right*; if not, the code can appear to be misleading: it does work in typical cases, but goes wrong in special ones. If you *clearly state the rules* you play with - "glue delimiter to the first chunk, never drop empty chunks", provide *examples*, and *generalize* the solution (why should we *hardcode* `delimiters`? What should appear on `null` input? - let's *teach* new user to `throw new ArgumentNullException(nameof(text))`), you'll get my *upvote* – Dmitry Bychenko Jul 22 '22 at 17:56