1

I need to obtain some data from a Postgres database from Spark. Because the SQL queries are quite large (close to 300 lines), I would like to read the query from a sql file instead of hard-coding it in the Scala source code.

However, I am having trouble properly reading a sql file in a way that preserves syntax. I can run the query just fine in DBeaver, but Scala will throw a org.postgresql.util.PSQLException. A typical stacktrace will point to a seemingly random position in the SQL file, but does not provide any hints.

I am reading SQL files like this:

val query = sc.textfile("someQuery.sql").collect().mkString("\n")

Which gives a string which seems to look like the original sql query. A Postgres exception will be thrown when I then run

spark.read.jdbc(DBUrl, query, connectionProperties)

edit

After modifying it a bit, the solution given here ended up working for me.

Some notes for whom it may concern:

  • always alias your query, ie. not "select * from tbl" but "(select * from tbl) tmp";
  • use SparkSession instead of creating a SQLContext from a SparkContext;

This snippet ended up working for me:

val df = spark
.read.format("jdbc")
.options(
  Map("url" -> jdbcUrl,
      "dbtable" -> aliased_query,
      "Driver" -> "org.postgresql.Driver",
      "user" -> <user>,
      "password" -> <password>)
).load()

(I know, don't hardcode passwords.)

Stijn
  • 551
  • 4
  • 17

0 Answers0