1

I am using the Stanford CoreNLP Model in an algorithm, which includes a Java client to the server, the (StanfordCoreNLPClient) in order to interact with CoreNLP, which is written in Java, via the command-line or its web service. Stanford CoreNLP has thus developed a Python package, called Stanza, including an API making requrests to Stanford CoreNLP server.

The model is working well on a notebook & on my personal computer. However, I did not manage to install stanza on an AWS EMR cluster, I always have the folowwing error that I don't manage to handle:

enter image description here

Thus, I have tried to use another python package to use the Stanford CoreNLP Server, and the only one I manage to install simply on my AWS EMR is PyNLP (https://github.com/sina-al/pynlp), which is a Python wrapper for Stanford CoreNLP by Sina. Again, it works well on a notebook, as well as on my personal computer, but this time I manage to install it (pypi library) on an EMR cluster. But then, whenever I instanciate a StanfordCoreNLP object, i get the following error: "HTTPConnectionPool(host='127.0.0.1', port=9000): Max retries exceeded with url: /?properties=%7B%22serializer%22%3A+%22edu.stanford.nlp.pipeline.ProtobufAnnotationSerializer%22%2C+%22outputFormat%22%3A+%22serialized%22%2C+%22annotators%22%3A+%22entitymentions%22%7D (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f6face84110>: Failed to establish a new connection: [Errno 111] Connection refused'))"... I really don't understand why, and above all why it works on an notebook & not on an AWS EMR.

For information, I am able to connect to the internet via the algorithm running on the AWS EMR cluster since I can use the "requests" module & do requests.get ... which is working well.

Could anyone explain me why I have got this error on the AWS EMR and not on the notebook or on my personal computer? Are the ports blocked on the AWS EMR? How could I do to make it work?

Thanks in advance for your precious help!!!

  • From the error message it seems that there is no space left on the device. What volume is attached to your EMR? And what instance do you use? Maybe you find [this SO answer](https://stackoverflow.com/a/20515528/5493813t) or [this AWS post](https://aws.amazon.com/de/premiumsupport/knowledge-center/no-space-left-on-device-emr-spark/) helpful – st.huber Jan 03 '22 at 07:37

1 Answers1

0

this issue is coming with the python requests that we're using. Those python requests are getting blocked from the domain we're hitting frequently. Need to use scrapy instead of python requests.

farooq
  • 1
  • 1
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community May 18 '22 at 08:45