1

I have a string which basically contains a paragraph. There might be line breaks. Now I would want to get only the 1st sentence in the string. I thought I would try

indexOf(". ") 

that is a dot with a space.

The problem is that this won't work though on a line such as firstName. LastName.

I'm using .Net. Is there a good method available to achieve this? Im also tagging Java to see if I can narrow down my search.

hippietrail
  • 15,848
  • 18
  • 99
  • 158
Geek
  • 3,187
  • 15
  • 70
  • 115
  • indexOf() is your best bet, if you are not interested in natural language parsing. Are there any restrictions on the input paragraph? like do all sentences end in ". ". or can they also end with ? and !. – Colin D May 01 '12 at 18:26

3 Answers3

2

What you need is a Natural Language Parsing (NLP) toolkit. It's very hard to write one yourself, as it requires a lot of research and data collection, but luckily it has already been done for you.

.NET

SharpNLP is a collection of natural language processing tools written in C#. Currently it provides the following NLP tools:

  • a sentence splitter
  • ...

Java

Community
  • 1
  • 1
Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
2

You need to somehow mark the end of a sentence. As you already noted a "." isn't doing that since it can be used differently ("Hi, my name is Mr. Pudelhund."). If possible I would recommend using some sign that won't be used.

Edit: The other method is good as well, but way more complicated. If you can't edit the string you are using though, that method beats mine ;)

Community
  • 1
  • 1
pudelhund
  • 512
  • 2
  • 4
  • 16
2

This can be with use very simple implementation with String.substring()

String example = "Hello world. This is example. " ;
System.out.print(example.substring(0, example.indexOf(".")+1)); // --> Hello world.
volkangurbuz
  • 259
  • 1
  • 4
  • 14