With below code I am trying to connect to HANA from spark-shell and bring the data from a particular table:
spark-submit --properties-file /users/xxx/spark-defaults.conf
./spark-shell --properties-file /users/xxx/spark-defaults.conf
val sparksqlContext = new org.apache.spark.sql.SQLContext(sc)
val driver ="com.sap.db.jdbc.Driver"
val url ="jdbc:sap://yyyyyy:12345"
val database= "STAGING"
val username = "uuuuu"
val password = "zzzzzz"
val table_view = "STAGING.Tablename"
val jdbcDF = sparksqlContext.read.format("jdbc").option("driver",driver).option("url",url).option("databaseName", database).option("user", username).option("password",password).option("dbtable", table_view).option("partitionColumn","INSTANCE_ID").option("lowerBound","7418403").option("upperBound","987026473").option("numPartitions","5").load()
jdbcDF.cache
jdbcDF.createOrReplaceTempView("TESTING_hanaCopy")
val results = sparksqlContext.sql("select * from TESTING_hanaCopy")
val resultsCounts = sparksqlContext.sql("select count(*) from TESTING_hanaCopy")
val countsval=results.count()
resultsCounts.show()
The error is as below:
scala> resultsCounts.show() org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: com.sap.db.jdbc.topology.Host Serialization stack: - object not serializable (class: com.sap.db.jdbc.topology.Host, value: yyyyyy:12345) - writeObject data (class: java.util.ArrayList) - object (class java.util.ArrayList, [yyyyyy:12345]) - writeObject data (class: java.util.Hashtable) - object (class java.util.Properties, {databasename=STAGING, dburl=jdbc:sap://yyyyyy:12345, user=uuuuu, password=zzzzzz, hostlist=[yyyyyy:12345]}) - field (class: org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions, name: asConnectionProperties, type: class java.util.Properties) - object (class org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions, org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions@7cd755a1) - field (class: org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1, name: options$1, type: class org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions)
I tried to understand the solutions provided here and here but could not understand what to change in the above code