I am using python 2.7 with spark standalone cluster on client mode.
I want to use jdbc for mysql and found that i need to load it using --jars
argument, I have the jdbc on my local, and manage to load it with pyspark console like here
When I write a python script inside my ide, using pyspark, I don't manage to load the additional jar mysql-connector-java-5.1.26.jar
and keep get
no suitable driver
error
How can I load additional jar files when running a python script in client mode, using a standalone cluster on client mode and refering to a remote master?
edit: added some code ######################################################################### this is the basic code that i am using, i use pyspark with spark context in python e.g i do not use spark submit directly and don't understand how to use spark submit parameters in this case...
def createSparkContext(masterAdress = algoMaster):
"""
:return: return a spark context that is suitable for my configs
note the ip for the master
app name is not that important, just to show off
"""
from pyspark.mllib.util import MLUtils
from pyspark import SparkConf
from pyspark import SparkContext
import os
SUBMIT_ARGS = "--driver-class-path /var/nfs/general/mysql-connector-java-5.1.43 pyspark-shell"
#SUBMIT_ARGS = "--packages com.databricks:spark-csv_2.11:1.2.0 pyspark-shell"
os.environ["PYSPARK_SUBMIT_ARGS"] = SUBMIT_ARGS
conf = SparkConf()
#conf.set("spark.driver.extraClassPath", "var/nfs/general/mysql-connector-java-5.1.43")
conf.setMaster(masterAdress)
conf.setAppName('spark-basic')
conf.set("spark.executor.memory", "2G")
#conf.set("spark.executor.cores", "4")
conf.set("spark.driver.memory", "3G")
conf.set("spark.driver.cores", "3")
#conf.set("spark.driver.extraClassPath", "/var/nfs/general/mysql-connector-java-5.1.43")
sc = SparkContext(conf=conf)
print sc._conf.get("spark.executor.extraClassPath")
return sc
sql = SQLContext(sc)
df = sql.read.format('jdbc').options(url='jdbc:mysql://ip:port?user=user&password=pass', dbtable='(select * from tablename limit 100) as tablename').load()
print df.head()
Thanks