4

I'm having an issue reading data via custom JDBC with Spark. How would I go about about overriding the sql dialect inferred via jdbc url?

The database in question is vitess (https://github.com/youtube/vitess) which runs a mysql variant, so I want to specify a mysql dialect. The jdbc url begins with jdbc:vitess/

Otherwise the DataFrameReader is inferring a default dialect which uses """ as a quote identifier. As a result, queries via spark.read.jdbc get sent as

Select 'id', 'col2', col3', 'etc' from table

which selects the string representations instead of the column values instead of

Select id, col2, col3, etc from table

Dan Kohn
  • 33,811
  • 9
  • 84
  • 100
Smith
  • 91
  • 1
  • 3

2 Answers2

8

Maybe it's too late. But answer will be next:

Create your custom dialect, as I did for ClickHouse database(my jdbc connection url looks like this jdbc:clickhouse://localhost:8123)

 private object ClickHouseDialect extends JdbcDialect {
    //override here quoting logic as you wish
    override def quoteIdentifier(colName: String): String = colName

    override def canHandle(url: String): Boolean = url.startsWith("jdbc:clickhouse")
  }

And register it somewhere in your code, like this:

JdbcDialects.registerDialect(ClickHouseDialect)
0

You can do something like this.

val jdbcDF = spark.read
  .format("jdbc")
  .option("url", "jdbc:postgresql:dbserver")
  .option("dbtable", "schema.tablename")
  .option("user", "username")
  .option("password", "password")
  .load()

For more info check this

You can also specify in this way.

val connectionProperties = new Properties()
    connectionProperties.put("user", "username")
    connectionProperties.put("password", "password")
    val jdbcDF2 = spark.read
      .jdbc("jdbc:postgresql:dbserver", "schema.tablename", connectionProperties)
Shankar
  • 8,529
  • 26
  • 90
  • 159
  • What would be the best way to specify a SQL Dialect without using the jdbc url? The issue is that I am using a custom jdbc with url prefix "jdbc:vitess". The jdbc needs a url specified as "jdbc:vitess" but I want spark to interpret the connection as a MySQL dialect. – Smith Nov 08 '16 at 23:46