1

To start with this is bit of context: in my cluster kubernetes there is spark app that is running and I want to add a deployment to start the spark history server that will read the logs generated by that app on a shared volume.

For some security measure in the project I can't use image of spark operator directly in my dockerfile. So I install spark via a conda env and pyspark in my dockerfile. I also export the env var ENV SPARK_HISTORY_OPTS instead of the config file as they should be the same.

SPARK_HISTORY_OPTS='-Dspark.history.fs.logDirectory=/execution-events -Dspark.eventLog.dir=/execution-events -Dspark.eventLog.enabled=true -Dspark.history.fs.cleaner.enabled=true -Dspark.history.ui.port=4039'

the shared volume that is mount on the deployment has the same path /execution-event

In my custom entrypoint.sh file there is a few steps,

- export the spark home
- start the spark history server with a simple: exec /usr/bin/tini -s -- $SPARK_HOME/sbin/start-history-server.sh

When I watch the deployment being created, the pod starts the server but then it die on the completed state and restarts in CrashLoopBackOff which is something I don't understand.

The spark history server should stay alive until I execute the stop-history-server.sh script, so why can't it stay alive ?

Thank for the futur answers.

PS: When I add a sleep of around 5 mins to debug and manually in ssh the pod, and start the server I can see the message: spark history server starts.

And I can see in the logs folder that the files are created.

This is the message in log of the pod:

+ exec /usr/bin/tini -s -- /opt/conda/envs/spark-env-3.1.2/lib/python3.7/site-packages/pyspark/sbin/start-history-server.sh                                                                                     │
│ starting org.apache.spark.deploy.history.HistoryServer, logging to /opt/conda/envs/spark-env-3.1.2/lib/python3.7/site-packages/pyspark/logs/spark--org.apache.spark.deploy.history.HistoryServer-1-spark-histor │
│ Stream closed EOF for ***NAMESPACE***/spark-history-deployment-65dd4dd6f5-wk27t (spark-history-container)
Veera Nagireddy
  • 1,656
  • 1
  • 3
  • 12
thomas
  • 11
  • 2
  • Refer to the **Nishant’s Devtron Blog** on [Troubleshoot: Pod Crashloopbackoff](https://devtron.ai/blog/troubleshoot_crashloopbackoff_pod/); How to spot a CrashLoopBackOff error,which may help to resolve your issue. – Veera Nagireddy Mar 12 '23 at 07:13

1 Answers1

0

The problem was something I found recently, in the entrypoint.sh file where I start the spark-history-server.sh script I need to set a env var used by the daemon script to not be used in background but in foreground to keep the pod alive.

to add before the execution of start-history-server.sh

export SPARK_NO_DAEMONIZE=false

Hope it will help futur guys/girls with same the problem.

thomas
  • 11
  • 2