I am running a spark-submit command that will do some database work via a Scala class.
spark-submit
--verbose
--class mycompany.MyClass
--conf spark.driver.extraJavaOptions=-Dconfig.resource=dev-test.conf
--conf spark.executor.extraJavaOptions=-Dconfig.resource=dev-test.conf
--master yarn
--driver-library-path /usr/lib/hadoop-lzo/lib/native/
--jars /home/hadoop/mydir/dbp.spark-utils-1.1.0-SNAPSHOT.jar,/usr/lib/phoenix/phoenix-client-hbase-2.4-5.1.2.jar,/usr/lib/hadoop-lzo/lib/hadoop-lzo.jar,/usr/lib/hadoop/lib/commons-compress-1.18.jar,/usr/lib/hadoop/hadoop-aws-3.2.1-amzn-5.jar,/usr/share/aws/aws-java-sdk/aws-java-sdk-bundle-1.12.31.jar
--files /home/hadoop/mydir/dev-test.conf
--num-executors 1
--executor-memory 3g
--driver-memory 3g
--queue default /home/hadoop/mydir/dbp.spark-utils-1.1.0-SNAPSHOT.jar
<<args to MyClass>>
When I run, I get an exception:
Caused by: java.sql.SQLException: No suitable driver found for jdbc:phoenix:host1,host2,host3:2181:/hbase;
at java.sql.DriverManager.getConnection(DriverManager.java:689)
at java.sql.DriverManager.getConnection(DriverManager.java:208)
at org.apache.phoenix.util.QueryUtil.getConnection(QueryUtil.java:422)
at org.apache.phoenix.util.QueryUtil.getConnection(QueryUtil.java:414)
Here are the relevant parts of my Scala code:
val conf: SerializableHadoopConfiguration =
new SerializableHadoopConfiguration(sc.hadoopConfiguration)
Class.forName("org.apache.phoenix.jdbc.PhoenixDriver")
val tableRowKeyPairs: RDD[(Cell, ImmutableBytesWritable)] =
df.rdd.mapPartitions(partition => {
val configuration = conf.get()
val partitionConn: JavaConnection = QueryUtil.getConnection(configuration)
// ...
}
My spark-submit
command includes /usr/lib/phoenix/phoenix-client-hbase-2.4-5.1.2.jar
using the --jars
argument. When I search that file for "org.apache.phoenix.jdbc.PhoenixDriver", I find it:
$ jar -tf /usr/lib/phoenix/phoenix-client-hbase-2.4-5.1.2.jar | grep -i driver
...
org/apache/phoenix/jdbc/PhoenixDriver.class
...
So why can't my program locate the driver?