I was testing a script that extracted tweets in real time using Spark Streaming. These tweets are supposed to be loaded into the IBM BigInsights hdfs environment. The script is written in python and I used yarn for cluster management.
It runs fine on my local standalone environment but when I run the code using
spark-submit --master yarn-cluster <name_of_script.py>
on my BigInsights server, it gives the following error:
error: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server
Traceback (most recent call last):
File "/data/hadoop-swap/yarn/local/usercache/prtbhd/appcache/application_1497088854141_0001/container_1497088854141_0001_01_000001/py4j-0.9-src.zip/py4j/java_gateway.py", line 690, in start
self.socket.connect((self.address, self.port))
File "<string>", line 1, in connect
Any idea on why this error is coming up and how to resolve this issue?