2

How can I count number of sentences in a given string?

M.Babcock
  • 18,753
  • 6
  • 54
  • 84
Luke G
  • 1,741
  • 6
  • 23
  • 34
  • 7
    Depends on the definition of "sentence" and what exactly is in the string. – Matti Virkkunen Feb 05 '12 at 17:21
  • 4
    What have you tried so far? What might these sentences contain? Will they always end with a ".", or possibly question marks/exclamation marks, etc? Might they contain numbers that have decimal points in? – Rob Levine Feb 05 '12 at 17:21
  • Count the number of `.`, `?`, `!` – Gilad Naaman Feb 05 '12 at 17:23
  • 2
    @Gilad - that would only give you a very naive solution. What if the sentence contains numbers with decimal points for example? There are many exceptions to this simplistic approach. It all depends what the OP is trying to do as to whether this would suffice. – Rob Levine Feb 05 '12 at 17:25
  • 1
    If a sentence mentions Dr. Miller and Dr. Rubinstein, is it still a single sentence? – vgru Feb 05 '12 at 17:29
  • @Gilad I am trying to implement Flesch–Kincaid readability test. There are dots at the end of sentences as well as within sentences, e.g. domains, IP addresses, etc. – Luke G Feb 05 '12 at 17:36

2 Answers2

12

You would need a natural language parsing library for that.

You could for example use SharpNLP which is a C# port of the OpenNLP project.

SharpNLP is a collection of natural language processing tools written in C#. Currently it provides the following NLP tools:

  • a sentence splitter
  • etc...

The article Statistical parsing of English sentences has some details on how to install and use the sentence detector in SharpNLP. The example code from that article is repeated below as a teaser, but please read the documentation for a more complete description of the features available and how they should be used.

using OpenNLP.Tools.SentenceDetect;

// ...

EnglishMaximumEntropySentenceDetector sentenceDetector = 
  new EnglishMaximumEntropySentenceDetector(mModelPath + "EnglishSD.nbin");
string[] sentences = sentenceDetector.SentenceDetect(input);

If you can assume a simple rule about your sentences such as that they all end in a period and that a period appears nowhere else apart form at the end of a sentence, then you could instead just count the number of periods in your text. However note that English text does not typically fit this pattern because:

  • There are other characters that can end a sentence apart from a period.
  • The period has other uses in English apart from ending sentences.
Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • There may be dots in other places - I am trying to implement Flesch–Kincaid readability test. – Luke G Feb 05 '12 at 17:34
  • @LukeG You might have to use Microsoft Word interop as suggested in the answers to your other question http://stackoverflow.com/questions/9151097/flesch-kincaid-readability-test – Roy Goode Feb 05 '12 at 17:46
  • @Roy Goode Good point, I updated my answer to demonstrate accessing all of the document statistics. – Kevin McCormick Feb 05 '12 at 17:51
7

If you already have Word installed, you can use Word interop to get the sentence count, as well as other statistics. This also has the benefit of potentially working with other languages besides English.

object oMissing = System.Reflection.Missing.Value;

var oWord = new Microsoft.Office.Interop.Word.Application();
oWord.Visible = false;
var oDoc = oWord.Documents.Add(ref oMissing, ref oMissing, ref oMissing, ref oMissing);

oDoc.Content.Text = inputTextBox.Text;

//get just sentence count
sentenceCountLabel.Text = oDoc.Sentences.Count.ToString();

//get all statistics
foreach (Microsoft.Office.Interop.Word.ReadabilityStatistic stat in oDoc.ReadabilityStatistics)
{
    Console.WriteLine("{0}: {1}", stat.Name, stat.Value);
}

object oFalse = false;

oDoc.Close(ref oFalse, ref oMissing, ref oMissing);

This will output:

 Words: 283 
 Characters: 1271 
 Paragraphs: 3 
 Sentences: 6 
 Sentences per Paragraph: 2 
 Words per Sentence: 47.1 
 Characters per Word: 4.3 
 Passive Sentences: 0 
 Flesch Reading Ease: 55.2 
 Flesch-Kincaid Grade Level: 12.5

This may not be the most efficient, but it only requires a few lines of code and may be suitable depending on your needs.

Kevin McCormick
  • 2,358
  • 20
  • 20