PySpark script is not getting completed

Asked Aug 02 '22 at 06:15

Active Aug 23 '23 at 10:36

Viewed 44 times

I have written PySpark script to read the data from database and csv, the script is not getting completed, it being executed. Below is the code,

import pyspark
from pyspark.sql import SparkSession
from pyspark.context import SparkContext
import os

os.environ['JAVA_HOME'] = 'path/to/java/bin/'

spark = SparkSession.builder.appName("") \
.config("spark.jars", "path/to/postgresql-42.3.2.jar") \
.config("spark.driver.extraClassPath", "path/to/postgresql-42.3.2.jar") \
.config("spark.executor.extraClassPath", "path/to/postgresql-42.3.2.jar") \
.getOrCreate()


df_etl_control = spark.read \
    .format("jdbc") \
    .option("url", "jdbc:postgresql://database_connection") \
    .option("query", "select * from table") \
    .option("user", "wwwww") \
    .option("password", "wwwwww").load()

df = spark.read.option("header",True).csv("path/to/csv")
df.printSchema()

I was getting RuntimeError: Java gateway process exited before sending its port number error before adding java home path os.environ['JAVA_HOME'] = 'path/to/java/bin/'*. After adding the path, job is not getting completed.

Any suggestion would be great.

edited Aug 23 '23 at 10:36

Oli

9,766
5
25
46

asked Aug 02 '22 at 06:15

Jim Macaulay

4,709
4
28
53

How are you running it? Jupyter? spark-submit? – Guy Melul Aug 02 '22 at 07:08
Am running in Jupyter notebook – Jim Macaulay Aug 02 '22 at 08:05
Does this answer your question? [Pyspark: Exception: Java gateway process exited before sending the driver its port number](https://stackoverflow.com/questions/31841509/pyspark-exception-java-gateway-process-exited-before-sending-the-driver-its-po) – Koedlt Apr 10 '23 at 19:08

PySpark script is not getting completed

0 Answers0