Pyspark command on jupyter: Connecting spark on remote server

Question

I have configured Spark 2.1 on my remote linux server (IBM RHEL Z systems). I am trying to create a SparkContext and getting the below error

from pyspark.context import SparkContext, SparkConf
master_url="spark://<IP>:7077"
conf = SparkConf()
conf.setMaster(master_url)
conf.setAppName("App1")
sc = SparkContext.getOrCreate(conf)

I am getting the below error. when i run the same code on the remote server in pyspark shell it works without error.

The currently active SparkContext was created at:

(No active SparkContext.)

    at org.apache.spark.SparkContext.assertNotStopped(SparkContext.scala:100)
    at org.apache.spark.SparkContext.getSchedulingMode(SparkContext.scala:1768)
    at org.apache.spark.SparkContext.postEnvironmentUpdate(SparkContext.scala:2411)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:563)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:236)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:214)
    at java.lang.Thread.run(Thread.java:748)

score 0 · Answer 1 · answered Oct 23 '17 at 04:25

0

It sounds like you haven't set jupyter to be the pyspark driver. Before controlling pyspark from jupyter you must first set PYSPARK_DRIVER_PYTHON=jupyter and PYSPARK_DRIVER_PYTHON_OPTS='notebook'. If I am not mistaken if you look at the code in libexec/bin/pyspark (on OSX) you will find instructions for setting up the jupyter notebook.

answered Oct 23 '17 at 04:25

Grr

15,553
7
65
85

this quick-and-dirty solution will cause problems downstream with `spark-submit`: https://stackoverflow.com/questions/46772280/spark-submit-cant-locate-local-file/46773025#46773025 . Better to handle it through Jupyter kernels: https://stackoverflow.com/questions/46286021/how-to-use-jupyter-sparkr-and-custom-r-install/46346658#46346658 – desertnaut Oct 23 '17 at 11:45

Pyspark command on jupyter: Connecting spark on remote server

1 Answers1