From working with PySpark, PostgreSQL, and Apache Sedona, I learned to solve this with 2 methods.
Method 1: Download the JAR file and add to spark.jars
In order to use PostgreSQL on Spark, I needed to add the JDBC driver (JAR file) to PySpark.
First, I created a jars
directory in the same level as my program and store the postgresql-42.5.0.jar
file there.
Then, I simply add this config to SparkSession with:
SparkSession.builder.config("spark.jars", "{JAR_FILE_PATH}")
spark = (
SparkSession.builder
.config("spark.jars", "jars/postgresql-42.5.0.jar")
.master("local[*]")
.appName("Example - Add a JAR file")
.getOrCreate()
)
Method 2: Use Maven Central coordinate and spark.jars.packages
If your dependency JAR files are available on Maven, you can use this method and not have to maintain any JAR file.
Steps
Find your package on Maven Central Repository Search

Select the correct package artifact and copy the Maven Central coordinate

In Python, call SparkSession.builder.config("spark.jars.packages", "{MAVEN_CENTRAL_COORDINATE}")
.
spark = (
SparkSession.builder
.appName('Example - adding many Maven packages')
.config("spark.serializer", KryoSerializer.getName)
.config("spark.kryo.registrator", SedonaKryoRegistrator.getName)
.config("spark.jars.packages",
"org.postgresql:postgresql:42.5.0,"
+ "org.apache.sedona:sedona-python-adapter-3.0_2.12:1.2.1-incubating,"
+ "org.datasyslab:geotools-wrapper:1.1.0-25.2")
.getOrCreate()
)
Pros of using sparks.jars.packages
- You can add several packages
- You don't have to manage the fat JAR files
Cons of using sparks.jars.packages
The .config("sparks.jars.packages", ...)
accept a single parameter, so in order to add several packages, you need to concatenate the package coordinates using ,
as the delimiter.
"org.postgresql:postgresql:42.5.0,"
+ "org.apache.sedona:sedona-python-adapter-3.0_2.12:1.2.1-incubating,"
+ "org.datasyslab:geotools-wrapper:1.1.0-25.2"
*** The string will not tolerate next line, spaces, or tabs in your code and it will cause nasty bugs that gives out irrelevant error logs.