I am trying to read a subset of rows from SQL Server as Spark Dataframe. I can read the entire table, but unable to specify a query to read a subset of rows.
Here's what I am trying..
val query = "(SELECT * FROM stg.mapping_chunk_stg WHERE mapping_status = 1)"
sqlContext.read.format("jdbc").options(
Map(
"url" -> "jdbc:jtds:sqlserver://abc/Property_Content",
"user" -> "user",
"driver" -> "net.sourceforge.jtds.jdbc.Driver",
"password" -> "pwd",
"dbtable" -> query, /*"stg.mapping_chunk_stg"*/
)
).load()
df.show()
I am getting error as below,
Exception in thread "main" java.sql.SQLException: Incorrect syntax near the keyword 'WHERE'.
at net.sourceforge.jtds.jdbc.SQLDiagnostic.addDiagnostic(SQLDiagnostic.java:372)
at net.sourceforge.jtds.jdbc.TdsCore.tdsErrorToken(TdsCore.java:2886)
at net.sourceforge.jtds.jdbc.TdsCore.nextToken(TdsCore.java:2328)
at net.sourceforge.jtds.jdbc.TdsCore.getMoreResults(TdsCore.java:638)
at net.sourceforge.jtds.jdbc.JtdsStatement.executeSQLQuery(JtdsStatement.java:506)
at net.sourceforge.jtds.jdbc.JtdsPreparedStatement.executeQuery(JtdsPreparedStatement.java:979)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:123)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:91)
at org.apache.spark.sql.execution.datasources.jdbc.DefaultSource.createRelation(DefaultSource.scala:60)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:125)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)
at com.company.ml.proj.datatransfer.SqlServerDataSource.getData(SqlServerDataSource.scala:24)
at com.company.ml.proj.datatransfer.Main$.main(Main.scala:17)
at com.company.ml.proj.datatransfer.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
It's mentioned on this page that dbtable
can be an SQL query.
Any pointers on how to fix this.
If I give query without WHERE
clause, then also getting same error.