I am trying to use stanford nlp pipeline for doing some basic NER and POS. I have slightly modified the core-nlp code to accomodate both ngram and training using a prop file. I feel the problem is with the way I am passing through the pipeline.
public class TestQG {
public static void main(String[] args) throws Exception {
String originaltext = "Obama is the President of America";
String modifiedtext = originaltext.replaceAll("[+,:;=?@#|<>.^*()%!]", "");
stanfordNLPParser(modifiedtext, useCustomNLPModel, customStopWordList, useCustomTokenREGEX);
}
public static void stanfordNLPParser(String modifiedtext, String customStopWordList) throws Exception {
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, sentiment, regexner");
props.setProperty("tokenize.language", "es");
props.setProperty("tokenize.whitespace", "true");
props.setProperty("regexner.mapping", "resources/customRegexNER.txt");
String[] args = new String[] { "-props", "resources/tempaSmNER.prop" };
CRFClassifier.main(args);
props.setProperty("ner.model", "resources/ner-model.ser.gz");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
List<CoreLabel> tokens = new ArrayList<CoreLabel>();
/*Collections to store the processed N-gram token (nGram tokenizer does not allow the result to be stored as CoreLabel, as required for the pipeline)*/
Collection<String> collectionOfProcessedTokens = new ArrayList<>();
Annotation document = new Annotation(modifiedtext);
pipeline.annotate(document);
/*List of tokens (before performing the nGram operations)*/
List<CoreLabel> tokensPreNgram = new ArrayList<CoreLabel>();
List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
for (CoreLabel token : sentence.get(TokensAnnotation.class)) {
tokensPreNgram.add((CoreLabel) token);
}
Collection<String> tokenPreprocessor = StringUtils.getNgramsFromTokens(tokensPreNgram, 1, 6);
for (String tokenNgramPostProcessor : tokenPreprocessor) {
collectionOfProcessedTokens.add(tokenNgramPostProcessor.replaceAll(" ", "_"));
}
CoreLabelTokenFactory string2CoreLabel = new CoreLabelTokenFactory();
for (String temp : collectionOfProcessedTokens) {
tokens.add(string2CoreLabel.makeToken(temp, temp, 0, 10));
}
for(CoreLabel token :tokens){
System.out.println("For TEST: Current Token is :" + token);
// generate Parts Of Speech
String tokenPOS = token.get(PartOfSpeechAnnotation.class);
System.out.println("For TEST: POS is :" + quesGenPOS);
if (token.get(PartOfSpeechAnnotation.class).matches(".*(WP).*")){ System.out.println(quesGentoken.get(TextAnnotation.class));}
String tokenNER = token.get(NamedEntityTagAnnotation.class);
String tokenSentiment = token.get(SentimentClass.class);
String tokenlemma = token.get(LemmaAnnotation.class);
}
}
}
I am facing an null pointer exception at the place i'm performing an POS. While the token is generated correctly, the POS is throwing 'null'. I believe, there most be a problem in the way i construct the pipeline.
The java.lang.NullPointerException
is @
`if (token.get(PartOfSpeechAnnotation.class).matches(".*(WP).*")){ System.out.println(quesGentoken.get(TextAnnotation.class));}
because String tokenPOS = token.get(PartOfSpeechAnnotation.class);
is not generating the POS
Any pointers as to why?