I'm doing an integration wherein I'm trying to read data from a simple gcp spanner table into Spark job which is running on a dataproc cluster. For this integration I'm using google-cloud-spanner-jdbc dependency in pom.xml. Though there is no exception when job runs, the problem is, it returns an empty dataframe with correct column names. I'm interested in using spark's native approach rather than standard JDBC connection.
PS: I have already done this integration using standard JDBC approach.
Below is the code snippet:
val spannerOptions = Map(
"url" -> s"jdbc:cloudspanner:/projects/$projectId/instances/$instanceId/databases/$databaseId",
"driver" -> "com.google.cloud.spanner.jdbc.JdbcDriver"
)
val df = spark.read.format("jdbc")
.options(spannerOptions)
//.option("dbtable", tableName) ---> tried this as well.
.option("query", "select * from emp")
//.schema(schema) --> tried specifying the schema as well.
.load()
df.show()