2

I'm trying to download a JSON file from a Postgres Server on R Studio, however, the fromJSON() method from the rjsonio package only works when the file is 1) on a local directory or 2) on a URL. Does anyone know a way to perform this on R Studio or on Python?

M.Yosh
  • 47
  • 5
  • Side note: I suggest you use [`jsonlite`](https://cran.r-project.org/web/packages/jsonlite/index.html) instead of [`RJSONIO`](https://cran.r-project.org/web/packages/RJSONIO/index.html); the former was updated last year, the latter over four years ago. – r2evans Aug 07 '18 at 18:22
  • 1
    This question is a bit broad and bordering on ["off-topic"](https://stackoverflow.com/help/on-topic) ("*recommend or find a book, tool, software library ...*"). Please include actual code -- as [reproducible as you can](https://stackoverflow.com/questions/5963269) since we do not have your pg instance. Do you have a query that returns a `data.frame` with columns that are just json strings? I don't know how else postgres would make JSON available to you. – r2evans Aug 07 '18 at 18:26
  • By file you mean data contained in the db right and not actual physical file. I think this could be useful https://www.postgresql.org/docs/10/static/datatype-json.html – mad_ Aug 07 '18 at 18:38
  • Yes, the JSON data is contained in the db. I ultimately want to work with the JSON data using dplyr and conduct exploratory analyses. I would prefer not to download the JSON data I would be using, which leads to my initial issue of not wanting to use the local directory. – M.Yosh Aug 07 '18 at 19:08
  • 1
    Based on that, I suggest that your title is misleading. You need to learn how to access the json-encoded data stored in postgres fields. @mad_'s link is good, and another one is http://www.postgresqltutorial.com/postgresql-json/, highlighting things like the json indexing operators, `->`, `-->` and the json-expanding `json_each` function. There is no "file" and you are not specifically "downloading" anything, it is all about *"querying postgres"* and *"dealing with the JSON within it"*. – r2evans Aug 07 '18 at 19:55
  • Thank you @r2evans! Maybe I am still not understanding this completely, but how would I eventually deal with the JSON within the postgres server through R? – M.Yosh Aug 08 '18 at 13:42
  • Correction: the two Postgres JSON operators are `->` and **`->>`** ... – r2evans Aug 08 '18 at 15:41

1 Answers1

2

I'm going out on a limb to infer what you are really needing here. Here is the question that I'm answering:

Title: How do I access json-encoded data in a postgres database?

Body: I need to access the data within JSON data stored in a postgres database. Once I connect, I can see that the data is there, but I don't know how to get the individual elements either in SQL or in R.

Sample Data

A well-crafted question should include sample data, so I'll add a sample table here. This data adapted from http://www.postgresqltutorial.com/postgresql-json/:

library(DBI)
con <- dbConnect(RPostgres::Postgres(), ...) # also works with `odbc::odbc()`, untested with `RODBC`
dbExecute(con, "
  create table mytable (
    id SERIAL8 PRIMARY KEY NOT NULL,
    valtext TEXT,
    jsontext JSON
  )")
d <- data.frame(
  valtext = c('{ "customer": "John Doe", "items": {"product": "Beer","qty": 6}}',
              '{ "customer": "Lily Bush", "items": {"product": "Diaper","qty": 24}}',
              '{ "customer": "Josh William", "items": {"product": "Toy Car","qty": 1}}',
              '{ "customer": "Mary Clark", "items": {"product": "Toy Train","qty": 2}}')
 )
d$jsontext <- d$valtext
dbWriteTable(con, "mytable", d, append=TRUE)
dbGetQuery(con, "select * from mytable")
#   id                                                                 valtext
# 1  1        { "customer": "John Doe", "items": {"product": "Beer","qty": 6}}
# 2  2    { "customer": "Lily Bush", "items": {"product": "Diaper","qty": 24}}
# 3  3 { "customer": "Josh William", "items": {"product": "Toy Car","qty": 1}}
# 4  4 { "customer": "Mary Clark", "items": {"product": "Toy Train","qty": 2}}
#                                                                  jsontext
# 1        { "customer": "John Doe", "items": {"product": "Beer","qty": 6}}
# 2    { "customer": "Lily Bush", "items": {"product": "Diaper","qty": 24}}
# 3 { "customer": "Josh William", "items": {"product": "Toy Car","qty": 1}}
# 4 { "customer": "Mary Clark", "items": {"product": "Toy Train","qty": 2}}

Extract JSON in-DB

If you continue with the tutorial, you'll see that Postgres includes two operators that work only on fields of type JSON:

dbGetQuery(con, "select id from mytable
                 where cast(jsontext -> 'customer' as text) like '%William%'")
#   id
# 1  3

but not TEXT or similar:

dbGetQuery(con, "select id from mytable
                 where cast(valtext -> 'customer' as text) like '%William%'")
# Error in result_create(conn@ptr, statement) (from functions.R#284) : 
#   ERROR:  operator does not exist: text -> unknown
# LINE 2:                  where cast(valtext -> 'customer' as text) l...
#                                             ^
# HINT:  No operator matches the given name and argument type(s). You might need to add explicit type casts.

You can retrieve individual components in a similar fashion:

dbGetQuery(con, "select jsontext -> 'items' -> 'qty' as quantity from mytable
                 where cast(jsontext -> 'items' -> 'product' as text) like '%Toy%'")
#   quantity
# 1        1
# 2        2

Extract JSON in-R

ret <- dbGetQuery(con, "select jsontext from mytable
                        where cast(jsontext -> 'items' -> 'product' as text) like '%Toy%'")
ret
#                                                                  jsontext
# 1 { "customer": "Josh William", "items": {"product": "Toy Car","qty": 1}}
# 2 { "customer": "Mary Clark", "items": {"product": "Toy Train","qty": 2}}

The brute-force (but still functional) approach is to apply the fromJSON function on each field. (I'm using jsonlite, but I think RJSONIO will work just as well here:

lapply(ret$jsontext, jsonlite::fromJSON)
# [[1]]
# [[1]]$customer
# [1] "Josh William"
# [[1]]$items
# [[1]]$items$product
# [1] "Toy Car"
# [[1]]$items$qty
# [1] 1
# [[2]]
# [[2]]$customer
# [1] "Mary Clark"
# [[2]]$items
# [[2]]$items$product
# [1] "Toy Train"
# [[2]]$items$qty
# [1] 2

Another method that jsonlite offers is via jsonlite::stream_in; I tried using RJSONIO::readJSONStream but couldn't get it to work. I didn't try hard, I'm hoping it's just as easy.

jsonlite::stream_in(textConnection(ret$jsontext))
#  Imported 2 records. Simplifying...
#       customer items.product items.qty
# 1 Josh William       Toy Car         1
# 2   Mary Clark     Toy Train         2

There are options to jsonlite::fromJSON that can be used with jsonlite::stream_in as well: I often need simplifyDataFrame=FALSE when the data is nested in a fashion more complex than above (notice how the '"items"' dictionary with two elements is "flattened" into "items.product" and "items.qty", a side-effect of the default action simplifyDataFrame=TRUE).

There is more to that tutorial, and likely countless other resources as well.

Community
  • 1
  • 1
r2evans
  • 141,215
  • 6
  • 77
  • 149