0

I am running a spark-submit command that will do some database work via a Scala class.

spark-submit 
--verbose 
--class mycompany.MyClass 
--conf spark.driver.extraJavaOptions=-Dconfig.resource=dev-test.conf 
--conf spark.executor.extraJavaOptions=-Dconfig.resource=dev-test.conf 
--master yarn 
--driver-library-path /usr/lib/hadoop-lzo/lib/native/ 
--jars /home/hadoop/mydir/dbp.spark-utils-1.1.0-SNAPSHOT.jar,/usr/lib/phoenix/phoenix-client-hbase-2.4-5.1.2.jar,/usr/lib/hadoop-lzo/lib/hadoop-lzo.jar,/usr/lib/hadoop/lib/commons-compress-1.18.jar,/usr/lib/hadoop/hadoop-aws-3.2.1-amzn-5.jar,/usr/share/aws/aws-java-sdk/aws-java-sdk-bundle-1.12.31.jar 
--files /home/hadoop/mydir/dev-test.conf 
--num-executors 1 
--executor-memory 3g 
--driver-memory 3g 
--queue default /home/hadoop/mydir/dbp.spark-utils-1.1.0-SNAPSHOT.jar
<<args to MyClass>>

When I run, I get an exception:

Caused by: java.sql.SQLException: No suitable driver found for jdbc:phoenix:host1,host2,host3:2181:/hbase;
   at java.sql.DriverManager.getConnection(DriverManager.java:689)
   at java.sql.DriverManager.getConnection(DriverManager.java:208)
   at org.apache.phoenix.util.QueryUtil.getConnection(QueryUtil.java:422)
   at org.apache.phoenix.util.QueryUtil.getConnection(QueryUtil.java:414)

Here are the relevant parts of my Scala code:

    val conf: SerializableHadoopConfiguration =
        new SerializableHadoopConfiguration(sc.hadoopConfiguration)
    Class.forName("org.apache.phoenix.jdbc.PhoenixDriver")
    val tableRowKeyPairs: RDD[(Cell, ImmutableBytesWritable)] =
        df.rdd.mapPartitions(partition => {
            val configuration = conf.get()
            val partitionConn: JavaConnection = QueryUtil.getConnection(configuration)
            // ...
        }

My spark-submit command includes /usr/lib/phoenix/phoenix-client-hbase-2.4-5.1.2.jar using the --jars argument. When I search that file for "org.apache.phoenix.jdbc.PhoenixDriver", I find it:

$ jar -tf /usr/lib/phoenix/phoenix-client-hbase-2.4-5.1.2.jar | grep -i driver
...
org/apache/phoenix/jdbc/PhoenixDriver.class
...

So why can't my program locate the driver?

kc2001
  • 5,008
  • 4
  • 51
  • 92

1 Answers1

0

I was able to get the program to find the driver by adding the following argument to the spark-submit command shown in the question:

--conf "spark.executor.extraClassPath=/usr/lib/phoenix/phoenix-client-hbase-2.4-5.1.2.jar" 

This StackOverflow article has great explanations for what the various arguments do.

kc2001
  • 5,008
  • 4
  • 51
  • 92