0

My code is:

APP_NAME = "mysql_query"

if __name__ == "__main__":
    conf = SparkConf().setAppName(APP_NAME)
    conf = conf.setMaster("local[*]")
    sc = SparkContext(conf=conf)
    sqlContext = SQLContext(sc) 

hostname = "hostname"
dbname = "database_name"
jdbcPort = 3306
username = "username"
password = "password"
jdbc_url = "jdbc:mysql://{}:{}/{}?user={}&password={}".format(hostname, jdbcPort, dbname, username, password)

query = "(SELECT * XXXXXXX_XXXX_XXX_XX) t1_alias"

df = sqlContext.read.format('jdbc').options(driver='com.mysql.jdbc.Driver', url=jdbc_url, dbtable=query).load()

This code is currently on an S3 bucket. I've SSH-ed into the EMR master node and every time I submit the code using spark-submit --master yarn --deploy-mode cluster mysql_spark.py I get the error - java.lang.ClassNotFoundException: com.mysql.jdbc.Driver.

I have installed the jdbc driver required. What's the issue here? Help!

ouila
  • 45
  • 1
  • 9

1 Answers1

0

Try below -

spark-submit --master yarn \
  --deploy-mode cluster \
  --jars mysql-connector-java-8.0.19.jar \
  --driver-class-path mysql-connector-java-8.0.19.jar \
  --conf spark.executor.extraClassPath=mysql-connector-java-8.0.19.jar \
  mysql_spark.py

ref this answer

Som
  • 6,193
  • 1
  • 11
  • 22