I'm trying to fetch table data as DataFrame using pyspark.
The code below works, but I am wondering can I specify the name of JDBC driver, by not using option
function in the sparkSession.read()
ss = ( SparkSession.builder.appName(args.job_name)
.config('spark.jars.packages', 'com.mysql:mysql-connector-j:8.0.31')
.getOrCreate()
)
return ss.read\
.option("driver", "com.mysql.cj.jdbc.Driver")\
.jdbc(url=connection_str, table=table_name, column="id", lowerBound=0, upperBound=row_cnt, numPartitions=3)
Can I specify --driver-class-path
or via sparkSession config, or any other ways?
Edit:
I also tried using spark.driver.extraClassPath
as How to specify driver class path when using pyspark within a jupyter notebook? says, but it didn't help.