0

I was using Stanford Lexparser recently. Unfortunately, I have encountered a problem as it take a very long time especially when I pass in a large file. Will multithreading help improve the performance? I know that multithreading can be easily done in command line. However, I would like to multithread it using the API internally. Currently, I am using this code. How I make it multithread?

for (List<HasWord> sentence : new DocumentPreprocessor(fileReader)) {
        parse = lp.apply(sentence);
        TreePrint tp = new TreePrint("typedDependenciesCollapsed");
        tp.printTree(parse, pw);
}
edwin
  • 1,152
  • 1
  • 13
  • 27
  • Have you tried anything yet to multithread it? [here is a post](http://stackoverflow.com/a/3330440/3960399) that goes into some detail about different approaches – Frank Bryce Jan 29 '16 at 16:30
  • does printTree literally prints something? If so, then most time is spent by printer anyway, and multithreading won't help. – Alexei Kaigorodov Jan 29 '16 at 18:03

1 Answers1

2

You can just use regular old Java thread to annotate documents in parallel. For example:

Properties props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,parse");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

Annotation ann = new Annotation("your sentence here");
for (int i = 0; i < 100; ++i) {
  new Thread() {
    @Override public void run() {
      pipeline.annotate(ann);  // except, you should probably annotate different documents.
      Tree tree = ann.get(SentencesAnnotation.class).get(0).get(TreeAnnotation.class);
    }
  }.start();
}

Another option is to use the simple API:

for (int i = 0; i < 100; ++i) {
  new Thread() {
    @Override public void run() {
      Tree tree = new Sentence("your sentence").parse();
    }
  }.start();
}

At a high level though, you're unlikely to get a phenomenally huge speedup from multithreading. Parsing is generally slow (O(n^3) wrt the sentence length) and multithreading only gives you max linear speedup in the number of cores. An alternative for making things faster would to be to either use the shift reduce parser, or, if you're ok with dependency and not constituency parses, the Stanford Neural Dependency Parser.

Gabor Angeli
  • 5,729
  • 1
  • 18
  • 29