Both kernels - PySpark
and default IPython
can be used with python3 interpreter on pyspark. It can be specified in ~/.sparkmagic/config.json
. This is standard spark configuration and will be just passed by sparkmagic
to the livy
server running on the spark master node.
"session_configs": {
"conf": {
"spark.pyspark.python":"python3"
}
}
spark.pyspark.python Python binary executable to use for PySpark in both driver and executors.
python3
is in this case available as command on the PATH
of each node in the spark cluster. You can install it also into a custom directory on each node and specify the full path. "spark.pyspark.python":"/Users/hadoop/python3.8/bin/python"
All spark conf options can be passed like that.
Thera are 2 ways for importing tensorflow
:
- install on all spark machines (master and workers) via
python3 -m pip install tensorflow
- zip, upload and pass the remote path through sparkmagic via
spark.submit.pyFiles
setting. Accepts a path on s3
, hdfs
or the master node file system (not a path on your machine)
See answer about --py-files