I am having issues with reading data from MSSQL server using Cloudera Spark. I am not sure where is the problem and what is causing it.
Here is my build.sbt
val sparkversion = "1.6.0-cdh5.10.1"
name := "SimpleSpark"
organization := "com.huff.spark"
version := "1.0"
scalaVersion := "2.10.5"
mainClass in Compile := Some("com.huff.spark.example.SimpleSpark")
assemblyJarName in assembly := "mssql.jar"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-streaming-kafka" % "1.6.0" % "provided",
"org.apache.spark" %% "spark-streaming" % "1.6.0" % "provided",
"org.apache.spark" % "spark-core_2.10" % sparkversion % "provided", // to test in cluseter
"org.apache.spark" % "spark-sql_2.10" % sparkversion % "provided" // to test in cluseter
)
resolvers += "Confluent IO" at "http://packages.confluent.io/maven"
resolvers += "Cloudera Repository" at "https://repository.cloudera.com/artifactory/cloudera-repos"
And here is my scala source:
package com.huff.spark.example
import org.apache.spark.sql._
import java.sql.{Connection, DriverManager}
import java.util.Properties
import org.apache.spark.{SparkContext, SparkConf}
object SimpleSpark {
def main(args: Array[String]) {
val sourceProp = new java.util.Properties
val conf = new SparkConf().setAppName("SimpleSpark").setMaster("yarn-cluster") //to test in cluster
val sc = new SparkContext(conf)
var SqlContext = new SQLContext(sc)
val driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
val jdbcDF = SqlContext.read.format("jdbc").options(Map("url" -> "jdbc:sqlserver://sqltestsrver;databaseName=LEh;user=sparkaetl;password=sparkaetl","driver" -> driver,"dbtable" -> "StgS")).load()
jdbcDF.show(5)
}
}
And this is the error I see:
17/05/24 04:35:20 ERROR ApplicationMaster: User class threw exception: java.lang.NullPointerException
java.lang.NullPointerException
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:155)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:91)
at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:222)
at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:146)
at com.huff.spark.example.SimpleSpark$.main(SimpleSpark.scala:16)
at com.huff.spark.example.SimpleSpark.main(SimpleSpark.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:552)
17/05/24 04:35:20 INFO ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.NullPointerException)
I know the problem is in line 16 which is:
val jdbcDF = SqlContext.read.format("jdbc").options(Map("url" -> "jdbc:sqlserver://sqltestsrver;databaseName=LEh;user=sparkaetl;password=sparkaetl","driver" -> driver,"dbtable" -> "StgS")).load()
But I can't pinpoint out what exactly is the problem. Is it something to do with access? (which is doubtful), problems with connection parameters (the error message would say it), or something else which I am not aware of. Thanks in advance :-)