0

I am trying to use stanford nlp pipeline for doing some basic NER and POS. I have slightly modified the core-nlp code to accomodate both ngram and training using a prop file. I feel the problem is with the way I am passing through the pipeline.

public class TestQG {

    public static void main(String[] args) throws Exception {

        String originaltext = "Obama is the President of America";
        String modifiedtext = originaltext.replaceAll("[+,:;=?@#|<>.^*()%!]", "");

        stanfordNLPParser(modifiedtext, useCustomNLPModel, customStopWordList, useCustomTokenREGEX);

    }

public static void stanfordNLPParser(String modifiedtext, String customStopWordList) throws Exception {

    Properties props = new Properties();

      props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, sentiment, regexner");
      props.setProperty("tokenize.language", "es");
      props.setProperty("tokenize.whitespace", "true");
      props.setProperty("regexner.mapping", "resources/customRegexNER.txt");

      String[] args = new String[] { "-props", "resources/tempaSmNER.prop" };
      CRFClassifier.main(args);
      props.setProperty("ner.model", "resources/ner-model.ser.gz");     

      StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
      List<CoreLabel> tokens = new ArrayList<CoreLabel>();

    /*Collections to store the processed N-gram token (nGram tokenizer does not allow the result to be stored as CoreLabel, as required for the pipeline)*/
    Collection<String> collectionOfProcessedTokens = new ArrayList<>();

    Annotation document = new Annotation(modifiedtext);
    pipeline.annotate(document);

      /*List of tokens (before performing the nGram operations)*/
      List<CoreLabel> tokensPreNgram = new ArrayList<CoreLabel>();
      List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);

      for (CoreMap sentence : sentences) {
      for (CoreLabel token : sentence.get(TokensAnnotation.class)) {                
          tokensPreNgram.add((CoreLabel) token);
        }

    Collection<String> tokenPreprocessor = StringUtils.getNgramsFromTokens(tokensPreNgram, 1, 6);
         for (String tokenNgramPostProcessor : tokenPreprocessor) {
           collectionOfProcessedTokens.add(tokenNgramPostProcessor.replaceAll(" ", "_"));                    
                   }

           CoreLabelTokenFactory string2CoreLabel = new CoreLabelTokenFactory();
         for (String temp : collectionOfProcessedTokens) {
           tokens.add(string2CoreLabel.makeToken(temp, temp, 0, 10));                    
         }


        for(CoreLabel token :tokens){

          System.out.println("For TEST: Current Token is  :" + token);          

          // generate Parts Of Speech
          String tokenPOS = token.get(PartOfSpeechAnnotation.class);
          System.out.println("For TEST: POS is   :" + quesGenPOS);

            if (token.get(PartOfSpeechAnnotation.class).matches(".*(WP).*")){ System.out.println(quesGentoken.get(TextAnnotation.class));}
          String tokenNER = token.get(NamedEntityTagAnnotation.class);
          String tokenSentiment = token.get(SentimentClass.class);
          String tokenlemma = token.get(LemmaAnnotation.class);

      }
    } 
  }

I am facing an null pointer exception at the place i'm performing an POS. While the token is generated correctly, the POS is throwing 'null'. I believe, there most be a problem in the way i construct the pipeline.

The java.lang.NullPointerException is @

`if (token.get(PartOfSpeechAnnotation.class).matches(".*(WP).*")){ System.out.println(quesGentoken.get(TextAnnotation.class));} 

because String tokenPOS = token.get(PartOfSpeechAnnotation.class); is not generating the POS

Any pointers as to why?

ChrisF
  • 134,786
  • 31
  • 255
  • 325
Betafish
  • 1,212
  • 3
  • 20
  • 45
  • Please [edit] your question to include the null pointer exception stacktrace and indicate which line of your code is producing the exception. – Kenster Sep 30 '16 at 10:42
  • `Exception in thread "main" java.lang.NullPointerException at com.idml.quesgen.TestQG.stanfordNLPParser(TestQG.java:65) at com.idml.quesgen.TestQG.main(TestQG.java:8)` – Betafish Sep 30 '16 at 10:46
  • `...com.idml.quesgen.TestQG.stanfordNLPParser(TestQG.java:65)` And which line is line 65? – Kenster Sep 30 '16 at 10:49
  • I've updated the code now. Can you see anything as to why? – Betafish Sep 30 '16 at 11:02
  • 1
    (1) @filburt @rc -- this is clearly not asking "what is a null pointer?", this is asking why CoreNLP null pointers in this case; i.e., why an annotation doesn't exist. (2) the problem here is that your tokens list is not gotten from sentence.get(TokensAnnotation.class), but rather from a bunch of mutations on the token, one of which (`tokenPreprocessor`) strips away all the annotations and treats a token as just a string. When you convert this back into a CoreLabel, the POS tags are no longer set (how could they be?). – Gabor Angeli Sep 30 '16 at 15:44
  • @GaborAngeli, you are right. I also presumed the same. But, what I don't understand is I actually use the `CoreLabelTokenFactory` to build back the tokens from strings. I believe, I should then re-iterate the pipeline (starting with the annotator). What I did was to pass these token into another annotator pipeline, but annotator only accepts text and not corelabels. Any ideas as to how different could I have done? – Betafish Oct 03 '16 at 05:16
  • You can always rebuild your string with each of your new tokens separated by a space and use the option "tokenize.whitespace" set to "true". Could you provide more clarity about what you are trying to do though? – StanfordNLPHelp Oct 10 '16 at 07:29
  • @GaborAngelli, post your comment as an answer to help future users – Jake Nov 03 '16 at 05:15

0 Answers0