Prepared statement in spark-jdbc

Question

I am trying to read the data from the MSSQL database using Spark jdbc with a specified offset. So the data should be loaded only after the specified timestamp which would be this offset. I tried to implement it by providing a query in jdbc configuraions, however, I did not found a possibility to create a prepared statement with parameterized values. In this case, I want to parameterize an offset which would change after each application launch. How can I implement it using jdbc options?

All the database configs reside in the application.conf file. This is the way I read from the database:

def jdbcOptions(query: String) = Map[String,String](
    "driver" -> config.getString("sqlserver.db.driver"),
    "url" -> config.getString("sqlserver.db.url"),
    "dbtable" -> s"(select * from TestAllData where update_database_time >= '2019-03-19 12:30:00.003') as subq,
    "user" -> config.getString("sqlserver.db.user"),
    "password" -> config.getString("sqlserver.db.password"),
    "customSchema" -> config.getString("sqlserver.db.custom_schema")
  )

    val testDataDF = sparkSession
      .read
      .format("jdbc")
      .options(jdbcOptions())
      .load()

Instead a query should look almost like this:

s"(select * from TestAllData where update_database_time >= $tmstp) as subq

You can just put your $tmstp directly in query if I understand your question correctly. — dytyniak, Apr 02 '19 at 17:34
Because it is not the best and secure way to embed parameters into queries using String and not prepared statements. — Cassie, Apr 03 '19 at 07:59
Are you going to run this job depends on parameters from users? — dytyniak, Apr 03 '19 at 08:06
I am going to provide a changing parameter which I read from Kafka, so I do not want to hardcode the value — Cassie, Apr 03 '19 at 08:15
As far as I know, there is no way to do it using spark built in possibilities. However, you could try third party library that could validate your query before inserting in spark reader. — dytyniak, Apr 03 '19 at 09:39

score 2 · Accepted Answer · answered May 12 '19 at 12:30

There is no prepared statement in Spark-jdbc, so there are not other ways except for setting parameters in a String:

val dayColumn = "update_database_time"
val dayValue = "2019-03-19 12:30:00.003"

s"(select * from TestAllData where $dayColumn > '$dayValue') as subq"

Prepared statement in spark-jdbc

1 Answers1