The main problem is that we are unable to run spark in client mode.
Whenever we try to connect to spark on YARN mode from kubeflow notebook we have the following error:
`Py4JJavaError: An error occurred while calling o81.showString.
: org.apache.spark.SparkException: Job 0 cancelled because SparkContext was shut down
at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:932)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:930)`
It seems we have exact same issue here:
Up to now:
- we have managed to submit spark on notebook.
- Also, it is possible to connect cluster mode from kubeflow notebook.
- We have also managed to run spark session with python shell on one of the worker server on kubernetes. We are able to connect remote edge node which managed by Cloudera.
- We have checked that there is no network issue between hadoop clusters and kubernetes clusters.
However, we still have no access interactive spark on jupyter notebook.