Setting spark.app.name for PySpark kernel with Jupyter Notebook

Question

I am running a Jupyter Notebook server with PySpark (as explained here) on a Hadoop cluster with YARN. I noticed that each Spark application launched via a new notebook, appears in the Spark Web UI as an application named "PySparkShell" (which corresponds to the "spark.app.name" configuration).

My problem is that I sometimes have many notebooks running in Jupyter, but all of them appear in Spark's Web UI with the same generic name of "PySparkShell". I know I can change the default name to something else, and I also know that I cannot change the app name once a SparkContext has been created. My question is: Can I make so that each application will be given a different name when the kernel starts? (preferably something that will help me connect the notebook name, i.e. 'Untitled.ipynb', to its Spark application name or ID)

UPDATE: added a code snippet of my run command for the notebook

export DAEMON_PORT=8880
ANACONDA_PATH=/opt/cloudera/parcels/Anaconda/bin
export PATH=$ANACONDA_PATH:$PATH
export PYSPARK_DRIVER_PYTHON=$ANACONDA_PATH/jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook --NotebookApp.open_browser=False --NotebookApp.ip='*' --NotebookApp.port=$DAEMON_PORT"
pyspark2 \
--executor-memory 5g \
--executor-cores 4 \
--driver-memory 20g \
--conf spark.shuffle.service.enabled=true \
--conf spark.dynamicAllocation.enabled=true \
--conf spark.dynamicAllocation.minExecutors=0 \
--conf spark.dynamicAllocation.maxExecutors=40

score 0 · Answer 1 · answered May 07 '18 at 06:00

0

In the first few lines where you specify you SparkContext() you can include a config object. You can use the config object to set various settings but chaining a set('property_name', 'property_value')

I'll demonstrate by setting the executor memory

from pyspark import SparkConf, SparkContext
conf = SparkConf().setAppName('Your_Project_name').set("spark.executor.memory", "5g")
sc = SparkContext(conf)

answered May 07 '18 at 06:00

pissall

7,109
2
25
45

1

This won't work, since the spark context is already created automatically when the kernel starts, and I can't change the configuration once that happens. – Zohar Meir May 08 '18 at 06:27

Setting spark.app.name for PySpark kernel with Jupyter Notebook

1 Answers1