0

So I am trying to run CoreNLP in an embedded fashion.

Currently, I am loading a single instance of the StanfordCoreNLP pipeline (and keeping a static reference to it in Main) before any call and then calling annotate to tag parts of the sentence. then walking the tree and looking for NP tags with Tregex.

The current speed on my machine on Strings ranging from 1-250 words (upper bound tested) is approx ~.4s. Is there anyway of increasing this speed. I tried setting -Xms & -Xmx params but this doesn't have any effect.

    ArrayList <List<String>> result = new ArrayList <List<String>> ();

    Annotation annotation = new Annotation(testSentence);
    Main.pipeline.annotate(annotation);


    TregexPattern pattern = TregexPattern.compile("@NP");

    List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
    for(CoreMap sen : sentences){
        List<String> sentenceList = new ArrayList<>();
        Tree tree = sen.get(TreeAnnotation.class);

        TregexMatcher matcher = pattern.matcher(tree);
        while (matcher.find()) {
            Tree match = matcher.getMatch();
            List<Tree> leaves = match.getLeaves();

            String nounPhrase = Joiner.on(' ').join(Lists.transform(leaves, Functions.toStringFunction()));
            sentenceList.add(nounPhrase);
        }

        result.add(sentenceList);
    }

1 Answers1

0

Maybe you can try to put threads property, it will increase performance. You can check Sebastian Schuster's answer about this topic: https://stackoverflow.com/a/30686865/3177662

Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
props.put("threads", "8")
StanfordCoreNLP pipeline  = new StanfordCoreNLP(props);
Community
  • 1
  • 1
afaik
  • 56
  • 4