4

The current Postgresql version (9.4) supports json and jsonb data type as described in http://www.postgresql.org/docs/9.4/static/datatype-json.html

For instance, JSON data stored as jsonb can be queried via SQL query:

SELECT jdoc->'guid', jdoc->'name'
FROM api 
WHERE jdoc @> '{"company": "Magnafone"}';

As a Sparker user, is it possible to send this query into Postgresql via JDBC and receive the result as DataFrame?

What I have tried so far:

val url = "jdbc:postgresql://localhost:5432/mydb?user=foo&password=bar"
val df = sqlContext.load("jdbc",
  Map("url"->url,"dbtable"->"mydb", "driver"->"org.postgresql.Driver"))
df.registerTempTable("table")
sqlContext.sql("SELECT data->'myid' FROM table")

But sqlContext.sql() was unable to understand the data->'myid' part in the SQL.

zero323
  • 322,348
  • 103
  • 959
  • 935
CJLam
  • 801
  • 3
  • 10
  • 15
  • I think I am one step closer: I can incorporate the desired postgresql SQL in a FROM clause of SQL and supply it as 'dbtable' when I run sqlContext.load, e.g.: val df = sqlContext.load(..., "dbtable"->"(SELECT data->'myid' FROM table) AS foo") – CJLam Dec 28 '15 at 04:32
  • Combining with the answer in https://stackoverflow.com/questions/34460496/from-postgres-jsonb-into-spark-jsonrdd I think I have my problem solved. But any other advice is welcome. – CJLam Dec 28 '15 at 04:37

1 Answers1

3

It is not possible to query json / jsonb fields dynamically from Spark DataFrame API. Once data is fetched to Spark it is converted to string and is no longer a queryable structure (see: SPARK-7869).

As you've already discovered you can use dbtable / table arguments to pass a subquery directly to the source and use it to extract fields of interest. Pretty much the same rule applies to any non-standard type, calling stored procedures or any other extensions.

zero323
  • 322,348
  • 103
  • 959
  • 935