Unable to print Hana Data from Spark shell

Question

With below code I am trying to connect to HANA from spark-shell and bring the data from a particular table:

    spark-submit --properties-file /users/xxx/spark-defaults.conf
    ./spark-shell --properties-file /users/xxx/spark-defaults.conf
    val sparksqlContext = new org.apache.spark.sql.SQLContext(sc) 
    val driver ="com.sap.db.jdbc.Driver"
    val url ="jdbc:sap://yyyyyy:12345"
    val database= "STAGING"
    val username  = "uuuuu"
    val password = "zzzzzz"
    val table_view = "STAGING.Tablename"
    val jdbcDF = sparksqlContext.read.format("jdbc").option("driver",driver).option("url",url).option("databaseName", database).option("user", username).option("password",password).option("dbtable", table_view).option("partitionColumn","INSTANCE_ID").option("lowerBound","7418403").option("upperBound","987026473").option("numPartitions","5").load()
    jdbcDF.cache
    jdbcDF.createOrReplaceTempView("TESTING_hanaCopy")
    val results = sparksqlContext.sql("select * from TESTING_hanaCopy")
    val resultsCounts = sparksqlContext.sql("select count(*) from TESTING_hanaCopy")
    val countsval=results.count()
    resultsCounts.show()

The error is as below:

scala> resultsCounts.show() org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: com.sap.db.jdbc.topology.Host Serialization stack: - object not serializable (class: com.sap.db.jdbc.topology.Host, value: yyyyyy:12345) - writeObject data (class: java.util.ArrayList) - object (class java.util.ArrayList, [yyyyyy:12345]) - writeObject data (class: java.util.Hashtable) - object (class java.util.Properties, {databasename=STAGING, dburl=jdbc:sap://yyyyyy:12345, user=uuuuu, password=zzzzzz, hostlist=[yyyyyy:12345]}) - field (class: org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions, name: asConnectionProperties, type: class java.util.Properties) - object (class org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions, org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions@7cd755a1) - field (class: org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1, name: options$1, type: class org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions)

I tried to understand the solutions provided here and here but could not understand what to change in the above code

score 0 · Answer 1 · answered Oct 12 '17 at 06:58

Note section from this Blog post resolved the issue:

Note: I’ve tested Spark using the recent SPS12 version of the Hana JDBC Driver (ngdbc.jar) against a SPS10 & SPS12 system and both seem ed to work fine. Older versions of the driver give the following error in Spark: ‘ org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: com.sap.db.jdbc.topology.Host’

Unable to print Hana Data from Spark shell

1 Answers1