1

I'm trying to read a postgres/postgis table into a spark 2.0 dataframe like this.

val jdbcUrl = s"jdbc:postgresql://${host}:${port}/${dbName}"

val connectionProperties = new Properties()
  connectionProperties.put("user", s"${user}")
  connectionProperties.put("password", s"${password}")
  connectionProperties.setProperty("Driver", "org.postgresql.Driver")

 def readTable ( table: String ): DataFrame = { 
   spark.read.jdbc(jdbcUrl, s"(select st_astext(geom) as geom from 
   ${table}) as t;", connectionProperties) 
 }

readTable("myschema.mytable")

I get this error:

org.postgresql.util.PSQLException: ERROR: syntax error at or near "WHERE"

I'm pretty sure this is caused by a where clause being added to the query as described in this question.

However according to the docs this method should work https://docs.databricks.com/spark/latest/data-sources/sql-databases.html#pushdown-query-to-database-engine

I need to use a query as a table name because I need to get the postgis geometry as a wkt string. My question is, has anyone found a way to read a table with a query as a table name like this? Or does anyone see anything wrong with my code? Or perhaps another way? thanks

Adrien Brunelat
  • 4,492
  • 4
  • 29
  • 42
J.Hammond
  • 251
  • 3
  • 17
  • Typo. You have unnecessary `;` at the end of the query. Should be `s"(select st_astext(geom) as geom from ${table}) as t"` – zero323 Jul 25 '18 at 00:47
  • 1
    That was it! so simple, thank you – J.Hammond Jul 25 '18 at 18:01
  • 1
    Just an update, since I asked this question I've learned that your query must correctly fit into the from clause of whatever sql spark is running under the hood. So for example if your loading data from another database say Redshift then your table name query must fit into the redshift sql being run. So it would need to look like ```(select column as alias from table)```. – J.Hammond Feb 24 '20 at 14:31

0 Answers0