-3

I want to split a sentence with "." and "?" .You can the Actual output needed column but when ever I'm using regex or .split("//.") but i'm getting error because it also splitting after Mr. but it's not the end period. How to include only end period means when the actual sentence stop

 public class SentenceSplitting {
       public static void main(String[] args){
           Scanner ss= new Scanner(System.in);
           System.out.println("Enter the string");
           String sentence= ss.nextLine();

           String[] hold= sentence.split("(?<=[.!?]) (?=[^.!?])");

           for(int i=0;i<hold.length;i++){
               System.out.println(hold[i]);
           }
       } 
    }

Output

Mr.
Smith bought cheapsite.com for 1.5 million dollars, i.e.
he paid a lot for it.
Did he mind?
Adam Jones Jr.
thinks he didn't.
In any case, this isn't true...
Well, with a probability of .9 it isn't.
The result should be:

The Actual Output Needed

Mr. Smith bought cheapsite.com for 1.5 million dollars, i.e. he paid a lot for it.
Did he mind?
Adam Jones Jr. thinks he didn't.
In any case, this isn't true...
Well, with a probability of .9 it isn't.

Input

Mr. Smith bought cheapsite.com for 1.5 million dollars, i.e. he paid a lot for it. Did he mind? Adam Jones Jr. thinks he didn't. In any case, this isn't true... Well, with a probability of .9 it isn't. The result should be:

Stone
  • 79
  • 9
  • It appears that there is nothing to decide where to split, it's only grammar and not regex – azro Sep 28 '17 at 15:37
  • 3
    By looking at your input, how do you (the human) can tell when to split the sentence and when not to? I do not see a pattern in your input to make the decision myself, let alone a computer. – blurfus Sep 28 '17 at 15:38
  • Go to the answer, then look at the pastebin comment. https://stackoverflow.com/questions/26704900/how-to-identify-a-end-of-a-sentence – IamBatman Sep 28 '17 at 15:38
  • @IamBatman that doesn't solve the issue since it would still split after `Mr.` and `i.e.`. The problem is that it's hard to find a pattern that differentiates between abbreviations and the end of a sentence. – Thomas Sep 28 '17 at 15:42
  • Yes @IamBatman you are right it's split after **Mr.** is there any other way to do it – Stone Sep 28 '17 at 15:43
  • The only way I can see (although I'm not an expert on this) would be to first replace all periods in abbreviations with something else, then split and finally put the periods back in (by replacing the replacement again). The remaining problem would be how to recoginize abbreviations. A brute-force method could be to provide them manually but that's very fragile. – Thomas Sep 28 '17 at 15:44
  • 1
    @Stone Parsing languages and detecting sentences is a complex issue. I doubt it can be done with regular expressions. – OH GOD SPIDERS Sep 28 '17 at 15:44
  • @OHGODSPIDERS i've tried a lot but still it split after **Mr.** how to do it – Stone Sep 28 '17 at 15:47
  • Make a dictionary, list, array, whatever, of things that aren't end of sentences. Like Mr. or Dr. or Mrs. or i.e...then loop through the string word by word and use a conditional and if it's one of the dictionary items, don't split it, if it is, split it. At least that is one way to do it. It'll just be living dictionary, list, array or whatever, you'll just learn what is and what isn't and add to it. Eventually you may cover all items that don't need to be split. – IamBatman Sep 28 '17 at 20:56
  • Finally it's done I got the way how to do it. Just look after it – Stone Sep 30 '17 at 15:43

1 Answers1

0
public class SentenceSplitting1 {
    public static void main(String[] args){
       BreakIterator iterator = BreakIterator.getSentenceInstance(Locale.US);
       String source = "Mr.Smith bought cheapsite.com for 1.5 million dollars, i.e. he paid a lot for it. Did he mind? Adam Jones Jr. thinks he didn't. In any case, this isn't true... Well, with a probability of .9 it isn't. The result should be:";
       iterator.setText(source);
       int start = iterator.first();

       for (int end = iterator.next();
                end != BreakIterator.DONE;
                start = end, end = iterator.next()) {
                System.out.println(source.substring(start,end));
}
   } 
}
Stone
  • 79
  • 9