0

I am trying to follow this Python notebook. I installed Spark directly in the notebook (!pip install pyspark), but when I do:

spark = SparkSession \
    .builder \
    .appName("question recommendation") \
    .config("spark.driver.maxResultSize", "96g") \
    .config("spark.driver.memory", "96g") \
    .config("spark.executor.memory", "8g") \
    .config("spark.master", "local[12]") \
    .getOrCreate()
sc = spark.sparkContext

I get a Runtime error on the first line:

RuntimeError                              Traceback (most recent call last)
<ipython-input-17-1b87e1472109> in <module>
      1 # spark config
----> 2 spark = SparkSession \
      3     .builder \
      4     .appName("question recommendation") \
      5     .config("spark.driver.maxResultSize", "96g") \

~\anaconda3\lib\site-packages\pyspark\sql\session.py in getOrCreate(self)
    226                             sparkConf.set(key, value)
    227                         # This SparkContext may be an existing one.
--> 228                         sc = SparkContext.getOrCreate(sparkConf)
    229                     # Do not update `SparkConf` for existing `SparkContext`, as it's shared
    230                     # by all sessions.

~\anaconda3\lib\site-packages\pyspark\context.py in getOrCreate(cls, conf)
    390         with SparkContext._lock:
    391             if SparkContext._active_spark_context is None:
--> 392                 SparkContext(conf=conf or SparkConf())
    393             return SparkContext._active_spark_context
    394 

~\anaconda3\lib\site-packages\pyspark\context.py in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
    142                 " is not allowed as it is a security risk.")
    143 
--> 144         SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
    145         try:
    146             self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,

~\anaconda3\lib\site-packages\pyspark\context.py in _ensure_initialized(cls, instance, gateway, conf)
    337         with SparkContext._lock:
    338             if not SparkContext._gateway:
--> 339                 SparkContext._gateway = gateway or launch_gateway(conf)
    340                 SparkContext._jvm = SparkContext._gateway.jvm
    341 

~\anaconda3\lib\site-packages\pyspark\java_gateway.py in launch_gateway(conf, popen_kwargs)
    106 
    107             if not os.path.isfile(conn_info_file):
--> 108                 raise RuntimeError("Java gateway process exited before sending its port number")
    109 
    110             with open(conn_info_file, "rb") as info:

RuntimeError: Java gateway process exited before sending its port number

I am very new to Apache Spark, is there anything I have installed incorrectly? Should I have installed it via Conda? Is there anything on my system that I need to check out?

futuredataengineer
  • 442
  • 1
  • 3
  • 14
  • 1
    Does this answer your question? [Pyspark: Exception: Java gateway process exited before sending the driver its port number](https://stackoverflow.com/questions/31841509/pyspark-exception-java-gateway-process-exited-before-sending-the-driver-its-po) – vladsiv Nov 17 '21 at 08:01

1 Answers1

1

The main clue to the error is in the last line

"RuntimeError: Java gateway process exited before sending its port number"

You can check an old stack overflow link below for solution

Pyspark: Exception: Java gateway process exited before sending the driver its port number

Sola Oshinowo
  • 519
  • 4
  • 13