Finding start and end point of sentence in a paragraph StanfordCoreNLP

Question

I was wondering how I can find the start and end position of a sentence in a paragraph using StanfordCoreNLP. Right now I am using DocumentPreprocessor to split the paragraph into sentences. Is it possible to get the start and end index of where the sentence is actually located in the original text?

I am using the code from another question asked on here.

String paragraph = "My 1st sentence. “Does it work for questions?” My third sentence.";
Reader reader = new StringReader(paragraph);
DocumentPreprocessor dp = new DocumentPreprocessor(reader);
List<String> sentenceList = new ArrayList<String>();

for (List<HasWord> sentence : dp) {
   String sentenceString = Sentence.listToString(sentence);
   sentenceList.add(sentenceString.toString());
}

for (String sentence : sentenceList) {
   System.out.println(sentence);
}

Taken from: How can I split a text into sentences using the Stanford parser?

Thanks

score 2 · Accepted Answer · answered Feb 10 '16 at 23:45

The quick and dirty way to do this would be:

import edu.stanford.nlp.simple.*;

Document doc = new Document("My 1st sentence. “Does it work for questions?” My third sentence.");
for (Sentence sentence : doc.sentences()) {
  System.out.println(sentence.characterOffsetBegin(0) + " -- " + sentence.characterOffsetEnd(sentence.length() - 1));
}

Otherwise, you can extract the CharacterOffsetBeginAnnotation and CharacterOffsetEndAnnotation from a CoreLabel, and use that to find the token's offset in the original text.

score 0 · Answer 2 · answered Jun 10 '20 at 20:32

0

See https://www.programcreek.com/java-api-examples/?api=edu.stanford.nlp.ling.CoreLabel for examples of getting CharacterOffsetEndAnnotation

answered Jun 10 '20 at 20:32

George Kowalski

1
1

2

Link answers are okay, but generally you need give a description of what the link says. This is because the website you linked may go down, meaning your answer will be useless. – 10 Rep Jun 10 '20 at 20:43

Finding start and end point of sentence in a paragraph StanfordCoreNLP

2 Answers2