0

I'm trying to connect to Greenplum database using PySpark, but getting and error when executing code below.

spark = SparkSession \
    .builder \
    .appName("Python Spark SQL basic example") \
.config("spark.jars", "C:/Users/SKamaliyev/Documents/Drivers/db/postgresql-42.2.20.jar") \
.getOrCreate()

df = spark.read \
    .format("jdbc") \
    .option("url", "jdbc:postgresql://address:port/gpl") \
    .option("dbtable", "dwh.vw_dm_subs_kpi_monthly") \
    .option("user", "skamaliyev") \
    .option("password", "passmy") \
    .option("driver", "org.postgresql.Driver") \
    .load()

An error:

An error occurred while calling o192.load.
: java.lang.ClassNotFoundException: org.postgresql.Driver
    at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:445)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:587)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520) ...

How can I solve this?

condexter
  • 13
  • 5
  • don't you need also some dependencies ? You should probably use `spark.jars.packages` and refer to a maven package – Steven Jul 18 '22 at 13:54
  • Does this answer your question? [Using pyspark to connect to PostgreSQL](https://stackoverflow.com/questions/34948296/using-pyspark-to-connect-to-postgresql) – Steven Jul 18 '22 at 13:56

1 Answers1

0

You dont have a driver, trying downloading the postgres driver and adding it in the path

Ivan Novick
  • 745
  • 2
  • 8
  • 12