I am trying to configure Spark interpreter on a local machine installed Zeppelin version 0.10.0 so that I can run scripts on a Spark cluster created also local on Docker. I am using docker-compose.yml from https://github.com/big-data-europe/docker-spark and Spark version 3.1.2.
After docker compose-up, I can see in the browser spark-master on localhost:8080 and History Server on localhost:18081. After reading the ID of the spark-master container, I can also run shell and spark-shell on it (docker exec -it xxxxxxxxxxxx /bin/bash). As host OS I am using Ubuntu 20.04, the spark.master in Zeppelin is set now to spark://localhost:7077, zeppelin.server.port in zeppelin-site.xml to 8070.
There is a lot of information about connecting a container running Zeppelin or running both Spark and Zeppelin in the same container but unfortunately I also use that Zeppelin to connect to the Hive via jdbc on VirtualBox Hortonworks cluster like in one of my previous posts and I wouldn't want to change that configuration now due to hardware resources. In one of the posts (Running zeppelin on spark cluster mode) I saw that such a connection is possible, unfortunately all attempts end with the "Fail to open SparkInterpreter" message.
I would be grateful for any tips.
Asked
Active
Viewed 375 times
2

uhlik
- 105
- 9
1 Answers
2
You need to change the spark.master
in Zeppelin to point to the spark master in the docker container not the local machine. Hence spark://localhost:7077
won't work.
The port 7077
is fine because that is the port specified in the docker-compose file you are using. To get the IP address of the docker container you can follow this answer. Since I suppose your container is named spark-master
you can try the following:
docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' spark-master
Then specify this as the spark.master
in Zeppelin: spark://docker-ip:7077

viggnah
- 1,709
- 1
- 3
- 12
-
Thank you, unfortunately it did not solve the problem. spark-master ip is obviously '172.21.0.2' but while browsing through the logs, I noticed a message "requirement failed: Can only call getServletHandlers on a running MetricsSystem" – uhlik Aug 11 '22 at 08:20
-
The error is probably caused by running a different Spark version in the cluster then the one used for Spark submit. [Supported Interpreters list](https://zeppelin.apache.org/supported_interpreters.html) does not include version 10.0, but this Zeppelin previously worked fine with Spark 3.1.3 with SPARK_HOME=/opt/spark, at the moment there is no value there. – uhlik Aug 11 '22 at 08:28
-
1The final solution is to set spark.master to spark://172.18.0.2:7077 (in my case) as suggested by @viggnah and downgrade the Spark version to 2.4.5-hadoop2.7 in bde2020 docker-compose file. It is the same version as pre-installed in Zeppelin 10.0 (the pre-installed version can be checked by setting spark.master to local[*] in Interpreters/spark and run `val sparkver = sc.version` in the notebook). – uhlik Aug 11 '22 at 16:46