I am using the Stanford CoreNLP Model in an algorithm, which includes a Java client to the server, the (StanfordCoreNLPClient) in order to interact with CoreNLP, which is written in Java, via the command-line or its web service. Stanford CoreNLP has thus developed a Python package, called Stanza, including an API making requrests to Stanford CoreNLP server.
The model is working well on a notebook & on my personal computer. However, I did not manage to install stanza on an AWS EMR cluster, I always have the folowwing error that I don't manage to handle:
Thus, I have tried to use another python package to use the Stanford CoreNLP Server, and the only one I manage to install simply on my AWS EMR is PyNLP (https://github.com/sina-al/pynlp), which is a Python wrapper for Stanford CoreNLP by Sina. Again, it works well on a notebook, as well as on my personal computer, but this time I manage to install it (pypi library) on an EMR cluster. But then, whenever I instanciate a StanfordCoreNLP object, i get the following error: "HTTPConnectionPool(host='127.0.0.1', port=9000): Max retries exceeded with url: /?properties=%7B%22serializer%22%3A+%22edu.stanford.nlp.pipeline.ProtobufAnnotationSerializer%22%2C+%22outputFormat%22%3A+%22serialized%22%2C+%22annotators%22%3A+%22entitymentions%22%7D (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f6face84110>: Failed to establish a new connection: [Errno 111] Connection refused'))"... I really don't understand why, and above all why it works on an notebook & not on an AWS EMR.
For information, I am able to connect to the internet via the algorithm running on the AWS EMR cluster since I can use the "requests" module & do requests.get ... which is working well.
Could anyone explain me why I have got this error on the AWS EMR and not on the notebook or on my personal computer? Are the ports blocked on the AWS EMR? How could I do to make it work?
Thanks in advance for your precious help!!!