-1

I want to get whole sentence and paragraph by finding word. For eg: If I search "released" in text "Hundreds of political prisoners have been released, and censorship rules have been relaxed. The EU and US have lifted the majority of sanctions against Burma as a result. " it should to return "Hundreds of political prisoners have been released, and censorship rules have been relaxed." and whole paragraph as well.

Wesley Baugh
  • 3,720
  • 4
  • 24
  • 42
Aryan G
  • 1,281
  • 10
  • 30
  • 51
  • 1
    You can use `contains`: "*Returns true if and only if this string contains the specified sequence of char values.*" – Maroun May 24 '13 at 11:37
  • 1
    Not sure whats the context.. How do you identify a paragraph.. are your text enclosed within

    ..

    ?
    – sanbhat May 24 '13 at 11:38
  • from where will you search? – stinepike May 24 '13 at 11:39
  • Why so many votes to close? Seems like a valid question to me. – John B May 24 '13 at 11:41
  • I agree with Maroun and sanbhat. Seems like the simplist way would be to split your string into sentences (split on `.!?`) and paragraphs (what marker is being used?) and use `contains` in each. You might be able to use a regular expression but is would be complex and error-prone. – John B May 24 '13 at 11:43

3 Answers3

1

How structured is your data?

You can probably get paragraphs by looking for 1-2+ new line characters. For sentences you are going to need to do some text segmentation. For example, using the NLTK library for Python you can use a pre-trained Punkt sentence segment-er, which is trained on a large corpus in order to learn that things like Mr. and U.S.A. do not mark the end of a sentence even though they contain periods (see this question: Python split text on sentences).

Once you can segment your text into paragraphs and sentences, you need to decide if you just want to do a linear pass over your corpus, or more likely index your data using information retrieval techniques, such as by building an inverted index, or by using an existing solution like Apache Lunce.

Community
  • 1
  • 1
Wesley Baugh
  • 3,720
  • 4
  • 24
  • 42
1

Use indexOf, then search backwards and forwards for the separator of the paragraph. Might be <p> or \n.

public static String findParagraph(String source, String searchText, String paragraphSeparator)
{
    final int locationOfSearchTerm = source.indexOf(searchText);
    if (locationOfSearchTerm == -1) return null;

    int paragraphEnd = source.indexOf(paragraphSeparator, locationOfSearchTerm + searchText.length);

    //if we didn't find an end of a paragraph, we want to go the end
    if (paragraphEnd == -1) paragraphEnd = searchText.length;

    int paragraphStart = source.lastIndexOf(paragraphSeparator, locationOfSearchTerm);

    //if we didn't find a start of a paragraph, we want to go the beginning
    if (paragraphStart == -1) paragraphStart = 0;

    return searchText.subString(paragraphStart, paragraphEnd - 1);
}
weston
  • 54,145
  • 21
  • 145
  • 203
0

There is a lot of way to do this. Here is one :

Create a map wich associate a sentence and a paragraph :

Map<String, String> map = new HashMap<String, String>();
map.put("Hundreds of political prisoners have been released, and censorship rules have been relaxed.", The EU and US have lifted the majority of sanctions against Burma as a result.);
...

Once you have built a map with all your texts you can search this way :

public Map<String, String> searchInSentence(String toFind, Map<String, String> texts){
    Map<String, String> result = new HashMap<String, String>();
    for(Entry<String, String> entry : texts.entrySet(){
       if(entry.getKey().contains(toFind){
           result.put(e.getKey(), e.getValue();
        }
    }

    return result;
}

It will return a Map where the sentence is the key and the paragraph is the value.

Julien Bodin
  • 783
  • 3
  • 19