I want to execute the following query on a remote Postgres server from a PySpark application using the JDBC connector:
SELECT id, postgres_function(some_column) FROM my_database GROUP BY id
The problem is I can't execute this kind of query on Pyspark using spark.sql(QUERY)
, obviously because the postgres_function
is not an ANSI SQL function supported since Spark 2.0.0.
I'm using Spark 2.0.1 and Postgres 9.4.