1

I am using python 2.7 with spark standalone cluster on client mode.

I want to use jdbc for mysql and found that i need to load it using --jars argument, I have the jdbc on my local, and manage to load it with pyspark console like here

When I write a python script inside my ide, using pyspark, I don't manage to load the additional jar mysql-connector-java-5.1.26.jar and keep get

no suitable driver

error

How can I load additional jar files when running a python script in client mode, using a standalone cluster on client mode and refering to a remote master?

edit: added some code ######################################################################### this is the basic code that i am using, i use pyspark with spark context in python e.g i do not use spark submit directly and don't understand how to use spark submit parameters in this case...

def createSparkContext(masterAdress = algoMaster):
    """
    :return: return a spark context that is suitable for my configs 
     note the ip for the master 
     app name is not that important, just to show off 
    """
    from pyspark.mllib.util import MLUtils
    from pyspark import SparkConf
    from pyspark import SparkContext
    import os


    SUBMIT_ARGS = "--driver-class-path /var/nfs/general/mysql-connector-java-5.1.43 pyspark-shell"
    #SUBMIT_ARGS = "--packages com.databricks:spark-csv_2.11:1.2.0 pyspark-shell"
    os.environ["PYSPARK_SUBMIT_ARGS"] = SUBMIT_ARGS
    conf = SparkConf()
    #conf.set("spark.driver.extraClassPath", "var/nfs/general/mysql-connector-java-5.1.43")
    conf.setMaster(masterAdress)
    conf.setAppName('spark-basic')
    conf.set("spark.executor.memory", "2G")
    #conf.set("spark.executor.cores", "4")
    conf.set("spark.driver.memory", "3G")
    conf.set("spark.driver.cores", "3")
    #conf.set("spark.driver.extraClassPath", "/var/nfs/general/mysql-connector-java-5.1.43")
    sc = SparkContext(conf=conf)
    print sc._conf.get("spark.executor.extraClassPath")

    return sc


sql = SQLContext(sc)
df = sql.read.format('jdbc').options(url='jdbc:mysql://ip:port?user=user&password=pass', dbtable='(select * from tablename limit 100) as tablename').load()
print df.head()

Thanks

thebeancounter
  • 4,261
  • 8
  • 61
  • 109
  • print your spark submit to see how you are using --jars option pls check my answer [here](https://stackoverflow.com/a/35550151/647053) – Ram Ghadiyaram Aug 27 '17 at 18:22
  • @RamGhadiyaram i do not use spark submit, i use spark context in python. see the edit, i will add some code for you to see – thebeancounter Aug 28 '17 at 06:02
  • Could you print the output of `sc.getConf().getAll()`? – MaFF Aug 29 '17 at 17:52
  • @Marie [(u'spark.driver.memory', u'3G'), (u'spark.executor.extraClassPath', u'file:///var/nfs/general/mysql-connector-java-5.1.43.jar'), (u'spark.app.name', u'spark-basic'), (u'spark.app.id', u'app-2017083'), (u'spark.rdd.compress', u'True'), (u'spark.master', u'spark://127.0.0.1:7077'), (u'spark.driver.port', u''), (u'spark.serializer.objectStreamReset', u'100'), (u'spark.executor.memory', u'2G'), (u'spark.executor.id', u'driver'), (u'spark.submit.deployMode', u'client'), (u'spark.driver.host', u''), (u'spark.driver.cores', u'3')] – thebeancounter Aug 30 '17 at 12:31
  • I have edited my answer, I believe the problem doesn't come from the .jar file – MaFF Aug 30 '17 at 14:16

1 Answers1

2

Your SUBMIT_ARGS is going to be passed to the spark-submit when creating a sparkContext from python. You should use --jars instead of --driver-class-path.

EDIT

Your problem is actually a lot simpler than it seems: you're missing the parameter driver in the options:

sql = SQLContext(sc)
df = sql.read.format('jdbc').options(
    url='jdbc:mysql://ip:port', 
    user='user',
    password='pass',
    driver="com.mysql.jdbc.Driver",
    dbtable='(select * from tablename limit 100) as tablename'
).load()

You can also put userand password in separate arguments.

MaFF
  • 9,551
  • 2
  • 32
  • 41