I want to get whole sentence and paragraph by finding word. For eg: If I search "released" in text "Hundreds of political prisoners have been released, and censorship rules have been relaxed. The EU and US have lifted the majority of sanctions against Burma as a result. " it should to return "Hundreds of political prisoners have been released, and censorship rules have been relaxed." and whole paragraph as well.
3 Answers
How structured is your data?
You can probably get paragraphs by looking for 1-2+ new line characters. For sentences you are going to need to do some text segmentation. For example, using the NLTK library for Python you can use a pre-trained Punkt sentence segment-er, which is trained on a large corpus in order to learn that things like Mr.
and U.S.A.
do not mark the end of a sentence even though they contain periods (see this question: Python split text on sentences).
Once you can segment your text into paragraphs and sentences, you need to decide if you just want to do a linear pass over your corpus, or more likely index your data using information retrieval techniques, such as by building an inverted index, or by using an existing solution like Apache Lunce.

- 1
- 1

- 3,720
- 4
- 24
- 42
Use indexOf
, then search backwards and forwards for the separator of the paragraph. Might be <p>
or \n
.
public static String findParagraph(String source, String searchText, String paragraphSeparator)
{
final int locationOfSearchTerm = source.indexOf(searchText);
if (locationOfSearchTerm == -1) return null;
int paragraphEnd = source.indexOf(paragraphSeparator, locationOfSearchTerm + searchText.length);
//if we didn't find an end of a paragraph, we want to go the end
if (paragraphEnd == -1) paragraphEnd = searchText.length;
int paragraphStart = source.lastIndexOf(paragraphSeparator, locationOfSearchTerm);
//if we didn't find a start of a paragraph, we want to go the beginning
if (paragraphStart == -1) paragraphStart = 0;
return searchText.subString(paragraphStart, paragraphEnd - 1);
}

- 54,145
- 21
- 145
- 203
There is a lot of way to do this. Here is one :
Create a map wich associate a sentence and a paragraph :
Map<String, String> map = new HashMap<String, String>();
map.put("Hundreds of political prisoners have been released, and censorship rules have been relaxed.", The EU and US have lifted the majority of sanctions against Burma as a result.);
...
Once you have built a map with all your texts you can search this way :
public Map<String, String> searchInSentence(String toFind, Map<String, String> texts){
Map<String, String> result = new HashMap<String, String>();
for(Entry<String, String> entry : texts.entrySet(){
if(entry.getKey().contains(toFind){
result.put(e.getKey(), e.getValue();
}
}
return result;
}
It will return a Map where the sentence is the key and the paragraph is the value.

- 783
- 3
- 19
..
? – sanbhat May 24 '13 at 11:38