My code is:
APP_NAME = "mysql_query"
if __name__ == "__main__":
conf = SparkConf().setAppName(APP_NAME)
conf = conf.setMaster("local[*]")
sc = SparkContext(conf=conf)
sqlContext = SQLContext(sc)
hostname = "hostname"
dbname = "database_name"
jdbcPort = 3306
username = "username"
password = "password"
jdbc_url = "jdbc:mysql://{}:{}/{}?user={}&password={}".format(hostname, jdbcPort, dbname, username, password)
query = "(SELECT * XXXXXXX_XXXX_XXX_XX) t1_alias"
df = sqlContext.read.format('jdbc').options(driver='com.mysql.jdbc.Driver', url=jdbc_url, dbtable=query).load()
This code is currently on an S3 bucket. I've SSH-ed into the EMR master node and every time I submit the code using spark-submit --master yarn --deploy-mode cluster mysql_spark.py
I get the error - java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
.
I have installed the jdbc driver required. What's the issue here? Help!