0

Working with pyspark and have tried just about everything. I followed a lot of the advice in this stack over flow post Pyspark: Exception: Java gateway process exited before sending the driver its port number to no avail. I am sure that I have spark installed right. I followed this youtube video for the installation: https://www.youtube.com/watch?v=cYL42BBL3Fo

I am using Spark 3.3.1 btw

Here is how I am defining the SparkSession

spark = SparkSession \
    .builder \
    .master('yarn') \
    .appName('ChainAnalysis_v0') \
    .config('spark.jars', 'gs://spark-lib/bigquery/spark-bigquery-latest.jar') \
    .config('spark.executor.cores', '3') \
    .config('spark.executor.memory', '5g') \
    .config("spark.sql.broadcastTimeout", "36000")\
    .getOrCreate()

Here are how my enviroment variables are configured

os.environ["JAVA_HOME"] = "C:\Program Files\Java\jdk-11.0.16.1"
os.environ["SPARK_HOME"] = 'C:\\Spark'
os.environ["PYSPARK_SUBMIT_ARGS"] = "--master local[3] pyspark-shell"


Even with all of this I am still running into the error

RuntimeError: Java gateway process exited before sending its port number

Any Suggestions?

hvs338
  • 1
  • 1

0 Answers0