I have written PySpark script to read the data from database and csv, the script is not getting completed, it being executed. Below is the code,
import pyspark
from pyspark.sql import SparkSession
from pyspark.context import SparkContext
import os
os.environ['JAVA_HOME'] = 'path/to/java/bin/'
spark = SparkSession.builder.appName("") \
.config("spark.jars", "path/to/postgresql-42.3.2.jar") \
.config("spark.driver.extraClassPath", "path/to/postgresql-42.3.2.jar") \
.config("spark.executor.extraClassPath", "path/to/postgresql-42.3.2.jar") \
.getOrCreate()
df_etl_control = spark.read \
.format("jdbc") \
.option("url", "jdbc:postgresql://database_connection") \
.option("query", "select * from table") \
.option("user", "wwwww") \
.option("password", "wwwwww").load()
df = spark.read.option("header",True).csv("path/to/csv")
df.printSchema()
I was getting RuntimeError: Java gateway process exited before sending its port number
error before adding java home path os.environ['JAVA_HOME'] = 'path/to/java/bin/'*
. After adding the path, job is not getting completed.
Any suggestion would be great.