0

I have written PySpark script to read the data from database and csv, the script is not getting completed, it being executed. Below is the code,

import pyspark
from pyspark.sql import SparkSession
from pyspark.context import SparkContext
import os

os.environ['JAVA_HOME'] = 'path/to/java/bin/'

spark = SparkSession.builder.appName("") \
.config("spark.jars", "path/to/postgresql-42.3.2.jar") \
.config("spark.driver.extraClassPath", "path/to/postgresql-42.3.2.jar") \
.config("spark.executor.extraClassPath", "path/to/postgresql-42.3.2.jar") \
.getOrCreate()


df_etl_control = spark.read \
    .format("jdbc") \
    .option("url", "jdbc:postgresql://database_connection") \
    .option("query", "select * from table") \
    .option("user", "wwwww") \
    .option("password", "wwwwww").load()

df = spark.read.option("header",True).csv("path/to/csv")
df.printSchema()

I was getting RuntimeError: Java gateway process exited before sending its port number error before adding java home path os.environ['JAVA_HOME'] = 'path/to/java/bin/'*. After adding the path, job is not getting completed.

Any suggestion would be great.

Oli
  • 9,766
  • 5
  • 25
  • 46
Jim Macaulay
  • 4,709
  • 4
  • 28
  • 53
  • How are you running it? Jupyter? spark-submit? – Guy Melul Aug 02 '22 at 07:08
  • Am running in Jupyter notebook – Jim Macaulay Aug 02 '22 at 08:05
  • Does this answer your question? [Pyspark: Exception: Java gateway process exited before sending the driver its port number](https://stackoverflow.com/questions/31841509/pyspark-exception-java-gateway-process-exited-before-sending-the-driver-its-po) – Koedlt Apr 10 '23 at 19:08

0 Answers0