I am trying to run pycorenlp on a long text. In order to avoid getting an CoreNLP request timed out. Your document may be too long
error message, I increased Stanford CoreNLP by specifying a timeout.
Here is the code I use (it's a simplified version of pycorenlp's example.py):
from pycorenlp import StanfordCoreNLP
if __name__ == '__main__':
nlp = StanfordCoreNLP('http://localhost:9000')
text = (
'Pusheen and Smitha walked along the beach. Pusheen wanted to surf,'
'but fell off the surfboard.')
output = nlp.annotate(text, properties={
'timeout': '10001' # Setting the timeout to 10000 or below "fixes" the issue.
'annotators': 'tokenize,ssplit,pos,depparse,parse',
'outputFormat': 'json'
})
print(output)
It outputs server: unknown error
. The server log contains:
java.net.UnknownHostException: server: server: unknown error
at java.net.InetAddress.getLocalHost(InetAddress.java:1505)
at edu.stanford.nlp.pipeline.StanfordCoreNLPServer$CoreNLPHandler.handle(StanfordCoreNLPServer.java:393)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
at sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:83)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:82)
at sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:675)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
at sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:647)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.UnknownHostException: server: unknown error
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
at java.net.InetAddress.getLocalHost(InetAddress.java:1500)
... 10 more
The Stanford Core NLP Server was launched using:
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer 9000
I don't want to segment the text into smaller texts.
Is there any way to set the timeout to be higher than 10000? (i.e. higher than 10 seconds)
It works fine on Mac OS X 10.10 (java version "1.8.0_60"): the issue arises in Ubuntu 14.04 (java version "1.8.0_77"). Both have Python 2.7 pycorenlp 0.2.0 and Stanford CoreNLP version 3.6.0.