I was having the exact same problem on an AWS EMR cluster (emr-5.31.0).
Setting spark.driver.extraClassPath
and spark.executor.extraClassPath
in SparkSession.builder.config()
, or spark-defaults.conf
, or with the spark-submit --jars
command to the location of the jodbc6.jar
did not work.
I finally got it to work by passing the Maven coordinates to spark.jars.packages
and then I also had to set spark.driver.extraClassPath
and spark.executor.extraClassPath
to $HOME/.ivy2/jars/*
.
import os
from pyspark.sql import SparkSession
spark_packages_list = [
'io.delta:delta-core_2.11:0.6.1',
'com.oracle.database.jdbc:ojdbc6:11.2.0.4',
]
spark_packages = ",".join(spark_packages_list)
home = os.getenv("HOME")
spark = (
SparkSession
.builder
.config("spark.jars.packages", spark_packages)
.config('spark.driver.extraClassPath', f"{home}/.ivy2/jars/*")
.config('spark.executor.extraClassPath', f"{home}/.ivy2/jars/*")
)
Then the following worked (change parameters accordingly):
host = "111.111.111.111"
port = "1234"
schema = "YourSchema"
URL = f"jdbc:oracle:thin:@{host}:{port}/{schema}"
with open(f"{home}/username.file", "r") as f:
username = f.read()
with open(f"{home}/password.file", "r") as f:
password = f.read()
query = "SELECT * FROM YourTable"
df = (spark.read.format("jdbc")
.option("url", URL)
.option("query", query)
.option("user", username)
.option("password", password)
.load()
)
df.printSchema()
df.show()
OR
properties = {
"user": username,
"password": password,
}
df = spark.read.jdbc(
url=URL,
table="YourTable",
properties=properties,
)
df.printSchema()
df.show()