1

When I am trying to run corpus pipeline on language resources. It is throwing the below (even though I follow the order as Document reset, english tokeniser, sentence splitter) Can someone help me with the process to debug this run-time error

Error:

gate.creole.ExecutionException: No sentences or tokens to process in document Password_Safe-window1.txt_0003E
Please run a sentence splitter and tokeniser first!
    at gate.creole.POSTagger.execute(POSTagger.java:257)
    at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291)
    at gate.creole.SerialController.runComponent(SerialController.java:225)
    at gate.creole.SerialController.executeImpl(SerialController.java:157)
    at gate.creole.SerialAnalyserController.executeImpl(SerialAnalyserController.java:223)
    at gate.creole.SerialAnalyserController.execute(SerialAnalyserController.java:126)
    at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291)
    at gate.gui.SerialControllerEditor$RunAction$1.run(SerialControllerEditor.java:1759)
    at java.lang.Thread.run(Thread.java:745)

Edit:

The files are not empty. As i tried to implement @dedek's suggestion, it has thrown no errors. But raised one more problem as follows:

Exception in thread "ApplicationViewer1" java.lang.OutOfMemoryError: Java heap space

dedek
  • 7,981
  • 3
  • 38
  • 68
  • Please don't post comments as a new answer. Rather edit your question (I did it already) or ask another one. Or add a comment to your question or my answer... – dedek Oct 26 '16 at 06:16
  • If your files are not empty, then something is wrong with your GATE application. But it is impossible to guess. You would have to post more details about your GATE app (english tokeniser, sentence splitter - what is the order, annotation set-s, etc.) – dedek Oct 26 '16 at 06:20
  • As for `OutOfMemoryError: Java heap space` see my edited answer... – dedek Oct 26 '16 at 06:20

1 Answers1

0

I think it is because your document is empty. Can you confirm that?

There is a run-time param failOnMissingInputAnnotations of the POSTagger, set it to false and it should be ok.

See also the docs:

failOnMissingInputAnnotations - if set to false, the PR will not fail with an ExecutionException if no input Annotations are found and instead only log a single warning message per session and a debug message per document that has no input annotations (run-time, default = true).


Concerning the OutOfMemoryError: Java heap space

See following questions:

Community
  • 1
  • 1
dedek
  • 7,981
  • 3
  • 38
  • 68