1

I have some data in a postgres table that is a string representation of an array of json data, like this:

[
  {"UsageInfo"=>"P-1008366", "Role"=>"Abstract", "RetailPrice"=>2, "EffectivePrice"=>0},
  {"Role"=>"Text", "ProjectCode"=>"", "PublicationCode"=>"", "RetailPrice"=>2},  
  {"Role"=>"Abstract", "RetailPrice"=>2, "EffectivePrice"=>0, "ParentItemId"=>"396487"}
]

This is is data in one cell from a single column of similar data in my database.

The datatype of this stored in the db is varchar(max).

My goal is to find the average RetailPrice of EVERY json item with "Role"=>"Abstract", including all of the json elements in the array, and all of the rows in the database.

Something like:

SELECT avg(json_extract_path_text(json_item, 'RetailPrice'))
FROM (
  SELECT cast(json_items to varchar[]) as json_item
  FROM my_table
  WHERE json_extract_path_text(json_item, 'Role') like 'Abstract'
)

Now, obviously this particular query wouldn't work for a few reasons. Postgres doesn't let you directly convert a varchar to a varchar[]. Even after I had an array, this query would do nothing to iterate through the array. There are probably other issues with it too, but I hope it helps to clarify what it is I want to get.

Any advice on how to get the average retail price from all of these arrays of json data in the database?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
johncorser
  • 9,262
  • 17
  • 57
  • 102
  • This is hard, since [Redshift does not seem to support `generate_series()` when tables are involved](http://stackoverflow.com/questions/22759980/generate-series-in-redhsift). If you have a fixed number of json array elements per row or at least a small maximum, you could improvise ... – Erwin Brandstetter Aug 27 '14 at 22:26
  • Also I would expect `{"UsageInfo":"P-1008366" ...` instead of `{"UsageInfo"=>"P-1008366" ...` (`:` instead of `=>`) for valid json. – Erwin Brandstetter Aug 27 '14 at 22:40
  • @Clodoaldo: I reopened this. *Not* a duplicate, since this is for Amazon Redshift and requires a different solution. – Erwin Brandstetter Aug 27 '14 at 23:02
  • @Erwin Why do you think the other is not also Redshift? – Clodoaldo Neto Aug 27 '14 at 23:55
  • @ClodoaldoNeto: Why do you think it is, when tags and text of the questions indicate otherwise? The OP seems to have put some thought into this. Even if the OP did not mean it that way, it should be interesting for the general public to compare solutions. – Erwin Brandstetter Aug 28 '14 at 00:51
  • 2
    Redshift now supports JSON ingestion with json path files http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html – androboy Sep 18 '14 at 07:55

1 Answers1

1

It does not seem like Redshift would support the json data type per se. At least, I found nothing in the online manual.

But I found a few JSON function in the manual, which should be instrumental:

JSON_ARRAY_LENGTH 
JSON_EXTRACT_ARRAY_ELEMENT_TEXT
JSON_EXTRACT_PATH_TEXT

Since generate_series() is not supported, we have to substitute for that ...

SELECT tbl_id
     , round(avg((json_extract_path_text(elem, 'RetailPrice'))::numeric), 2) AS avg_retail_price
FROM   (
   SELECT *, json_extract_array_element_text(json_items, pos) AS elem
   FROM  (VALUES (0),(1),(2),(3),(4),(5)) a(pos)
   CROSS JOIN tbl
   ) sub
WHERE  json_extract_path_text(elem, 'Role') = 'Abstract'
GROUP  BY 1;
  • I substituted with a poor man's solution: A dummy table counting from 0 to n (the VALUES expression). Make sure you count up to the maximum number of possible elements in your array. If you need this on a regular basis create an actual numbers table.

  • Modern Postgres has much better options, like json_array_elements() to unnest a json array. Compare to your sibling question for Postgres:

I tested in Postgres with the related operator ->>, where it works:
SQL Fiddle.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228