java.lang.ClassNotFoundException: com.mysql.jdbc.Driver on AWS EMR cluster

Question

My code is:

APP_NAME = "mysql_query"

if __name__ == "__main__":
    conf = SparkConf().setAppName(APP_NAME)
    conf = conf.setMaster("local[*]")
    sc = SparkContext(conf=conf)
    sqlContext = SQLContext(sc) 

hostname = "hostname"
dbname = "database_name"
jdbcPort = 3306
username = "username"
password = "password"
jdbc_url = "jdbc:mysql://{}:{}/{}?user={}&password={}".format(hostname, jdbcPort, dbname, username, password)

query = "(SELECT * XXXXXXX_XXXX_XXX_XX) t1_alias"

df = sqlContext.read.format('jdbc').options(driver='com.mysql.jdbc.Driver', url=jdbc_url, dbtable=query).load()

This code is currently on an S3 bucket. I've SSH-ed into the EMR master node and every time I submit the code using spark-submit --master yarn --deploy-mode cluster mysql_spark.py I get the error - java.lang.ClassNotFoundException: com.mysql.jdbc.Driver.

I have installed the jdbc driver required. What's the issue here? Help!

score 0 · Accepted Answer · answered Jun 22 '20 at 08:49

0

Try below -

spark-submit --master yarn \
  --deploy-mode cluster \
  --jars mysql-connector-java-8.0.19.jar \
  --driver-class-path mysql-connector-java-8.0.19.jar \
  --conf spark.executor.extraClassPath=mysql-connector-java-8.0.19.jar \
  mysql_spark.py

ref this answer

answered Jun 22 '20 at 08:49

Som

6,193
1
11
22

I just had to replace the version with the jdbc driver version I downloaded. Thank you! – ouila Jun 22 '20 at 10:16
Does the jdbc driver have to be installed on the slave nodes as well? – ouila Jun 22 '20 at 11:29
Yes, the above command adds the jar to the executor class path – Som Jun 22 '20 at 11:34

java.lang.ClassNotFoundException: com.mysql.jdbc.Driver on AWS EMR cluster

1 Answers1