0

I am trying to run pycorenlp on a long text. In order to avoid getting an CoreNLP request timed out. Your document may be too long error message, I increased Stanford CoreNLP by specifying a timeout.

Here is the code I use (it's a simplified version of pycorenlp's example.py):

from pycorenlp import StanfordCoreNLP

if __name__ == '__main__':
    nlp = StanfordCoreNLP('http://localhost:9000')
    text = (
        'Pusheen and Smitha walked along the beach. Pusheen wanted to surf,'
        'but fell off the surfboard.')
    output = nlp.annotate(text, properties={
        'timeout': '10001' # Setting the timeout to 10000 or below "fixes" the issue.
        'annotators': 'tokenize,ssplit,pos,depparse,parse',
        'outputFormat': 'json'
    })
    print(output)

It outputs server: unknown error. The server log contains:

java.net.UnknownHostException: server: server: unknown error
    at java.net.InetAddress.getLocalHost(InetAddress.java:1505)
    at edu.stanford.nlp.pipeline.StanfordCoreNLPServer$CoreNLPHandler.handle(StanfordCoreNLPServer.java:393)
    at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
    at sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:83)
    at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:82)
    at sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:675)
    at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
    at sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:647)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.UnknownHostException: server: unknown error
    at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
    at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
    at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
    at java.net.InetAddress.getLocalHost(InetAddress.java:1500)
    ... 10 more

The Stanford Core NLP Server was launched using:

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer 9000

I don't want to segment the text into smaller texts.

Is there any way to set the timeout to be higher than 10000? (i.e. higher than 10 seconds)

It works fine on Mac OS X 10.10 (java version "1.8.0_60"): the issue arises in Ubuntu 14.04 (java version "1.8.0_77"). Both have Python 2.7 pycorenlp 0.2.0 and Stanford CoreNLP version 3.6.0.

Franck Dernoncourt
  • 77,520
  • 72
  • 342
  • 501

1 Answers1

0

To be clear, you are not seeing this issue if you run the server on a Macbook?

Then this appears to be an issue with the machine you are running the server on. The server code is trying to call:

InetAddress.getLocalHost().getHostName()

and getting an exception.

Here is a thread I found where someone had a similar issue:

InetAddress.getLocalHost() throws UnknownHostException

What is in your /etc/hosts file on the machine where you're trying to run the server?

Community
  • 1
  • 1
StanfordNLPHelp
  • 8,699
  • 1
  • 11
  • 9