1

I am trying to read a subset of rows from SQL Server as Spark Dataframe. I can read the entire table, but unable to specify a query to read a subset of rows.

Here's what I am trying..

val query = "(SELECT * FROM stg.mapping_chunk_stg WHERE mapping_status = 1)"

sqlContext.read.format("jdbc").options(
  Map(
    "url" -> "jdbc:jtds:sqlserver://abc/Property_Content",
    "user" -> "user",
    "driver" -> "net.sourceforge.jtds.jdbc.Driver",
    "password" -> "pwd",
    "dbtable" -> query, /*"stg.mapping_chunk_stg"*/
  )
).load()

df.show()

I am getting error as below,

Exception in thread "main" java.sql.SQLException: Incorrect syntax near the keyword 'WHERE'.
at net.sourceforge.jtds.jdbc.SQLDiagnostic.addDiagnostic(SQLDiagnostic.java:372)
at net.sourceforge.jtds.jdbc.TdsCore.tdsErrorToken(TdsCore.java:2886)
at net.sourceforge.jtds.jdbc.TdsCore.nextToken(TdsCore.java:2328)
at net.sourceforge.jtds.jdbc.TdsCore.getMoreResults(TdsCore.java:638)
at net.sourceforge.jtds.jdbc.JtdsStatement.executeSQLQuery(JtdsStatement.java:506)
at net.sourceforge.jtds.jdbc.JtdsPreparedStatement.executeQuery(JtdsPreparedStatement.java:979)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:123)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:91)
at org.apache.spark.sql.execution.datasources.jdbc.DefaultSource.createRelation(DefaultSource.scala:60)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:125)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)
at com.company.ml.proj.datatransfer.SqlServerDataSource.getData(SqlServerDataSource.scala:24)
at com.company.ml.proj.datatransfer.Main$.main(Main.scala:17)
at com.company.ml.proj.datatransfer.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

It's mentioned on this page that dbtable can be an SQL query. Any pointers on how to fix this.

If I give query without WHERE clause, then also getting same error.

Vishal John
  • 4,231
  • 25
  • 41

0 Answers0