I followed the instructions for running spark-on-k8s on https://spark.apache.org/docs/3.1.1/running-on-kubernetes.html#content
After submitting the example which is to launch Spark Pi in cluster mode, the pod met an error and I can't understand why it happened.
This is the command line:
./bin/spark-submit \
--master k8s://https://2-120:6443 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=ethanzhang1997/spark:3.1.1 \
local:///path/to/examples.jar
Here is the error: error information
I think this container should use java env in the image, but it tried to read the JAVA_HOME from my current computer.
Any help is important to me, with much thanks!
Now I temporarily solve this problem. I downloaded the corresponding version of jdk into spark dir and add the following lines to the Dockerfile which is to build the spark image:
RUN mkdir -p /home/deploy
ADD jdk-8u201-linux-x64.tar.gz /home/deploy/
ENV JAVA_HOME /home/deploy/jdk1.8.0_201
ENV JRE_HOME ${JAVA_HOME}/jre
ENV CLASSPATH .:${JAVA_HOME}/lib:${JRE_HOME}/lib
ENV PATH ${JAVA_HOME}/bin:$PATH
This makes sure that I have the same JAVA_HOME both in the image and host computer.
But there is still one thing I can't understand.The hadoop and spark env is also different between my host computer and image. Why this doesn't cause a problem? I noticed that there is a process to mount spark dir on the image, but how it works?
By the way, it seems that the offcial guidence on spark-on-kubernetes makes openjdk11 as default. But if user's JAVA_HOME is not set like this , there would be a problem, is it?