2

I'm trying to use JDBC in a Scala Spark application, and I'm compiling with sbt. However when I add the line Class.forName("com.mysql.jdbc.Driver"), it throws a ClassNotFoundException.

My sbt file is this:

name := "SparkApp"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.1.0"
libraryDependencies += "com.databricks" %% "spark-csv" % "1.5.0"
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.1.0"
libraryDependencies += "mysql" % "mysql-connector-java" % "6.0.5"

As far as I can tell that last line is all I should need to add the JDBC driver, but it doesn't seem to be working. I've also tried Class.forName("com.mysql.jdbc.Driver").newInstance() but it has the same result, so I assume the issue is with the jdbc classes not being added correctly at all.

cogm
  • 275
  • 6
  • 17

4 Answers4

2

You don't need to supply the class name to use JDBC to load data frames. Following the Spark SQL documentation, you only have to supply "jdbc" as the data source format (and indeed add the connector as a dependency) and set the right options:

val host: String = ???
val port: Int = ???
val database: String = ???
val table: String = ???
val user: String = ???
val password: String = ???

val options = Map(
      "url" -> s"jdbc:mysql://$host:$port/$database?zeroDateTimeBehavior=convertToNull",
      "dbtable" -> table,
      "user" -> user,
      "password" -> password)

val df = spark.read.format("jdbc").options(options).load()

When you submit your application to Spark, you have to either include the MySQL connector into your final jar file, or tell spark-submit to get the package as a dependency:

spark-submit --packages mysql:mysql-connector-java:6.0.5 ...

This flag also works on spark-shell or pyspark.

sgvd
  • 3,819
  • 18
  • 31
  • 1
    What about for writing to the DB? My end goal was to take a DataFrame I have and write that into a table in MySQL. I tried doing that like this, but got a "No suitable driver" error: `val prop = new java.util.Properties() prop.setProperty("user", "username") prop.setProperty("password", "password") dataFrame.write.mode("append").jdbc("jdbc:mysql://localhost:3306/database", "table", prop)` – cogm Mar 01 '17 at 16:13
  • Writing should work similarly, with the same options. How do you run your code? If you submit it, see my edits on making sure the dependency is available. – sgvd Mar 01 '17 at 16:27
  • Added that packages argument and that seems to have solved the driver issue. Initially gave a deprecated error but I just had to change to `com.mysql.cj.jdbc.Driver`. Would have thought that the sbt dependency would be enough. Thanks! – cogm Mar 01 '17 at 16:41
0

Your mysql driver class com.mysql.jdbc.Driver it's not present on your classpath at runtime. If your are running your spark job with spark-submit than you have at least two options:

  • provide --jar options to specify the path for the mysql-*.jar (see this post) (if both worker and driver need the class then take a close look on the spark.executor.extraJavaOptions and spark.driver.extraJavaOptions)
  • build an uber jar (fat jar) that will include your mysql-* classes on your application jar (see this post)
Community
  • 1
  • 1
dumitru
  • 2,068
  • 14
  • 23
0

spark-submit \ --class com.mypack.MyClass \ --master yarn --deploy-mode cluster \ --conf spark.executor.extraClassPath=$POSTGRESQL_JAR_PATH:$MYSQL_JAR_PATH \ --conf spark.driver.extraClassPath=$POSTGRESQL_JAR_PATH:$MYSQL_JAR_PATH \

where, $POSTGRESQL_JAR_PATH and $MYSQL_JAR_PATH should be set with hdfs path to jar files.

hope this helps.

spark.executor.extraClassPath if you are running it in cluster mode. spark.driver.extraClassPath if you are running it in local.

I recommend setting both option to be on safer side.

desaiankitb
  • 992
  • 10
  • 17
0

You should pass driver jar while submitting the spark job like below:

1) spark-submit --jars mysql-connector-java-5.1.39.jar and rest of the parameters as you are passing

2) if you just wanna try on local using shell spark-shell --jars mysql-connector-java-5.1.39.jar

Update the driver to the one which you already have available and provide the absolute path to that