I'm trying to download a JSON file from a Postgres Server on R Studio, however, the fromJSON() method from the rjsonio package only works when the file is 1) on a local directory or 2) on a URL. Does anyone know a way to perform this on R Studio or on Python?
-
Side note: I suggest you use [`jsonlite`](https://cran.r-project.org/web/packages/jsonlite/index.html) instead of [`RJSONIO`](https://cran.r-project.org/web/packages/RJSONIO/index.html); the former was updated last year, the latter over four years ago. – r2evans Aug 07 '18 at 18:22
-
1This question is a bit broad and bordering on ["off-topic"](https://stackoverflow.com/help/on-topic) ("*recommend or find a book, tool, software library ...*"). Please include actual code -- as [reproducible as you can](https://stackoverflow.com/questions/5963269) since we do not have your pg instance. Do you have a query that returns a `data.frame` with columns that are just json strings? I don't know how else postgres would make JSON available to you. – r2evans Aug 07 '18 at 18:26
-
By file you mean data contained in the db right and not actual physical file. I think this could be useful https://www.postgresql.org/docs/10/static/datatype-json.html – mad_ Aug 07 '18 at 18:38
-
Yes, the JSON data is contained in the db. I ultimately want to work with the JSON data using dplyr and conduct exploratory analyses. I would prefer not to download the JSON data I would be using, which leads to my initial issue of not wanting to use the local directory. – M.Yosh Aug 07 '18 at 19:08
-
1Based on that, I suggest that your title is misleading. You need to learn how to access the json-encoded data stored in postgres fields. @mad_'s link is good, and another one is http://www.postgresqltutorial.com/postgresql-json/, highlighting things like the json indexing operators, `->`, `-->` and the json-expanding `json_each` function. There is no "file" and you are not specifically "downloading" anything, it is all about *"querying postgres"* and *"dealing with the JSON within it"*. – r2evans Aug 07 '18 at 19:55
-
Thank you @r2evans! Maybe I am still not understanding this completely, but how would I eventually deal with the JSON within the postgres server through R? – M.Yosh Aug 08 '18 at 13:42
-
Correction: the two Postgres JSON operators are `->` and **`->>`** ... – r2evans Aug 08 '18 at 15:41
1 Answers
I'm going out on a limb to infer what you are really needing here. Here is the question that I'm answering:
Title: How do I access json-encoded data in a postgres database?
Body: I need to access the data within JSON data stored in a postgres database. Once I connect, I can see that the data is there, but I don't know how to get the individual elements either in SQL or in R.
Sample Data
A well-crafted question should include sample data, so I'll add a sample table here. This data adapted from http://www.postgresqltutorial.com/postgresql-json/:
library(DBI)
con <- dbConnect(RPostgres::Postgres(), ...) # also works with `odbc::odbc()`, untested with `RODBC`
dbExecute(con, "
create table mytable (
id SERIAL8 PRIMARY KEY NOT NULL,
valtext TEXT,
jsontext JSON
)")
d <- data.frame(
valtext = c('{ "customer": "John Doe", "items": {"product": "Beer","qty": 6}}',
'{ "customer": "Lily Bush", "items": {"product": "Diaper","qty": 24}}',
'{ "customer": "Josh William", "items": {"product": "Toy Car","qty": 1}}',
'{ "customer": "Mary Clark", "items": {"product": "Toy Train","qty": 2}}')
)
d$jsontext <- d$valtext
dbWriteTable(con, "mytable", d, append=TRUE)
dbGetQuery(con, "select * from mytable")
# id valtext
# 1 1 { "customer": "John Doe", "items": {"product": "Beer","qty": 6}}
# 2 2 { "customer": "Lily Bush", "items": {"product": "Diaper","qty": 24}}
# 3 3 { "customer": "Josh William", "items": {"product": "Toy Car","qty": 1}}
# 4 4 { "customer": "Mary Clark", "items": {"product": "Toy Train","qty": 2}}
# jsontext
# 1 { "customer": "John Doe", "items": {"product": "Beer","qty": 6}}
# 2 { "customer": "Lily Bush", "items": {"product": "Diaper","qty": 24}}
# 3 { "customer": "Josh William", "items": {"product": "Toy Car","qty": 1}}
# 4 { "customer": "Mary Clark", "items": {"product": "Toy Train","qty": 2}}
Extract JSON in-DB
If you continue with the tutorial, you'll see that Postgres includes two operators that work only on fields of type JSON
:
dbGetQuery(con, "select id from mytable
where cast(jsontext -> 'customer' as text) like '%William%'")
# id
# 1 3
but not TEXT
or similar:
dbGetQuery(con, "select id from mytable
where cast(valtext -> 'customer' as text) like '%William%'")
# Error in result_create(conn@ptr, statement) (from functions.R#284) :
# ERROR: operator does not exist: text -> unknown
# LINE 2: where cast(valtext -> 'customer' as text) l...
# ^
# HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
You can retrieve individual components in a similar fashion:
dbGetQuery(con, "select jsontext -> 'items' -> 'qty' as quantity from mytable
where cast(jsontext -> 'items' -> 'product' as text) like '%Toy%'")
# quantity
# 1 1
# 2 2
Extract JSON in-R
ret <- dbGetQuery(con, "select jsontext from mytable
where cast(jsontext -> 'items' -> 'product' as text) like '%Toy%'")
ret
# jsontext
# 1 { "customer": "Josh William", "items": {"product": "Toy Car","qty": 1}}
# 2 { "customer": "Mary Clark", "items": {"product": "Toy Train","qty": 2}}
The brute-force (but still functional) approach is to apply the fromJSON
function on each field. (I'm using jsonlite
, but I think RJSONIO
will work just as well here:
lapply(ret$jsontext, jsonlite::fromJSON)
# [[1]]
# [[1]]$customer
# [1] "Josh William"
# [[1]]$items
# [[1]]$items$product
# [1] "Toy Car"
# [[1]]$items$qty
# [1] 1
# [[2]]
# [[2]]$customer
# [1] "Mary Clark"
# [[2]]$items
# [[2]]$items$product
# [1] "Toy Train"
# [[2]]$items$qty
# [1] 2
Another method that jsonlite
offers is via jsonlite::stream_in
; I tried using RJSONIO::readJSONStream
but couldn't get it to work. I didn't try hard, I'm hoping it's just as easy.
jsonlite::stream_in(textConnection(ret$jsontext))
# Imported 2 records. Simplifying...
# customer items.product items.qty
# 1 Josh William Toy Car 1
# 2 Mary Clark Toy Train 2
There are options to jsonlite::fromJSON
that can be used with jsonlite::stream_in
as well: I often need simplifyDataFrame=FALSE
when the data is nested in a fashion more complex than above (notice how the '"items"' dictionary with two elements is "flattened" into "items.product"
and "items.qty"
, a side-effect of the default action simplifyDataFrame=TRUE
).
There is more to that tutorial, and likely countless other resources as well.