-2

I have a String :

"Hello world... I am here. Please respond."

and I would like to count the number of sentences within the String. I had an idea to use a Scanner as well as the useDelimiter method to split any String into sentences.

Scanner in = new Scanner(file);
in.useDelimiter("insert here");

I'd like to create a regular expression which can go through the String I have shown above and identify it to have two sentences. I initially tried using the delimiter:

[^?.]

It gets hung up on the ellipses.

akash
  • 22,664
  • 11
  • 59
  • 87
Alok Bhatt
  • 33
  • 2
  • 3

4 Answers4

2

You could use a regular expression that checks for a non end of sentence, followed by an end of sentence like:

[^?!.][?!.]

Although as @Gabe Sechan points out, a regular expression may not be accurate when the sentence includes abbreviated words such as Dr., Rd., St., etc.

Moishe Lipsker
  • 2,974
  • 2
  • 21
  • 29
1

this could help :

public int getNumSentences() { List<String> tokens = getTokens( "[^!?.]+" ); return tokens.size(); }

and you can also add enter button as separator and make it independent on your OS by the following line of code

String pattern = System.getProperty("line.separator" + " ");

actually you can find more about the

Enter here : Java regex: newline + white space

and hence finally the method becomes :

public int getNumSentences() 
{
    List<String> tokens = getTokens( "[^!?.]+" + pattern + "+" );
    return tokens.size();
}

hope this could help :) !

Community
  • 1
  • 1
MMS
  • 11
  • 4
0

A regular expression probably isn't the right tool for this. English is not a regular language, so regular expressions get hung up- a lot. For one thing you can't even be sure a period in the middle of the text is an end of sentence- abbreviations (like Mr.), acronyms with periods, and initials will screw you up as well. Its not the right tool.

Gabe Sechan
  • 90,003
  • 9
  • 87
  • 127
  • A fully correct solution is impossible. A mostly right one it's possible. If this is homework (or self study) like it feels like, the purpose of the question is probably to make you realise that and the limitations of regular expressions – Gabe Sechan Sep 13 '15 at 06:09
  • Another reason regex is a bad approach is that occasionally you will have a document where the end of line is considered the end of a sentence (while other times it is not). – demongolem Jun 26 '20 at 11:42
0

For your sentence : "Hello world... I am here. Please respond."

The code will be :

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class JavaRegex {

    public static void main(String[] args) {
        int count=0;
        String sentence = "Hello world... I am here. Please respond.";
        Pattern pattern = Pattern.compile("\\..");
        Matcher matcher = pattern.matcher(sentence);
        while(matcher.find()) {
            count++;
        }
        System.out.println("No. of sentence = "+count); 
    }

}
Avijit Karmakar
  • 8,890
  • 6
  • 44
  • 59