2

I need some help parsing json so I can perform analysis on it. It's remote station measurement data that comes with location.

I've got to answer two simple questions and Ive got so far but it's giving me a brain block trying to figure some of the behaviour.

Nested arrays (four deep) is the main issue. I can't work out why lateral joins on jsonb_array_elements calls aren't doing the job.

I'm using postgres 9.4.4

Sample json is below as is the pg/sql I'm trying it with.

Here's the sample json. Anatomy/spec of the data and a pretty version can be found below:

{ "SiteRep": { "Wx": { "Param": [ { "name": "D", "units": "compass", "$": "Wind Direction" }, { "name": "H", "units": "%", "$": "Screen Relative Humidity" } ] }, "DV": { "dataDate": "2014-09-23T20:00:00Z", "type": "O", "Location": [ { "i": "3002", "lat": "60.749", "lon": "-0.854", "name": "BALTASOUND", "country": "SCOTLAND", "continent": "EUROPE", "elevation": "15.0", "Period": [ { "type": "Day", "value": "2014-09-22Z", "Rep": [ { "D": "SSW", "H": "88.5", "$": "1200" }, { "D": "S", "H": "91.6", "$": "1260" }, { "D": "SSW", "H": "92.8", "$": "1320" }, { "D": "W", "H": "92.8", "$": "1380" } ] }, { "type": "Day", "value": "2014-09-23Z", "Rep": [ { "D": "W", "H": "88.5", "$": "0" }, { "D": "W", "H": "86.7", "$": "60" }, { "D": "W", "H": "86.2", "$": "120" }, { "D": "WSW", "H": "87.9", "$": "180" } ] } ] }, { "i": "3005", "lat": "60.139", "lon": "-1.183", "name": "LERWICK (S. SCREEN)", "country": "SCOTLAND", "continent": "EUROPE", "elevation": "82.0", "Period": [ { "type": "Day", "value": "2014-09-22Z", "Rep": [ { "D": "SSW", "H": "95.3", "$": "1200" }, { "D": "SSW", "H": "97.4", "$": "1260" }, { "D": "SW", "H": "97.3", "$": "1320" }, { "D": "W", "H": "94.1", "$": "1380" } ] }, { "type": "Day", "value": "2014-09-23Z", "Rep": [ { "D": "WNW", "H": "89.1", "$": "0" }, { "D": "WNW", "H": "88.0", "$": "60" }, { "D": "W", "H": "90.9", "$": "120" }, { "D": "W", "H": "90.9", "$": "180" } ] } ] } ] } } }

Which I'm loading in with this code:

drop table if exists foo;
create table public.foo (data jsonb);

truncate foo;
copy foo from <example json as above>
DELIMITERS e'\x02';

First specific task is to count the number of "D" keys which have value "S".
Second task is to group these by different values of "$" key.

So I'm just trying to parse the json into pg tables with this:

select
"locations_entries"."date" as "location_date",
"locations_entries"."value" -> 'i' as "i",
"locations_entries"."value" -> 'lat' as "lat",
"locations_entries"."value" -> 'lon' as "lon",
-- "locations_entries"."value" -> 'Period' as "Period",
"locations_entries"."value" -> 'Period' -> 'value' as "p_value",
"locations_entries"."value" -> 'Period' -> 'type' as "type",
left("locations_entries"."value"::text,64) || '...'  as "value"
from
(
select "data" -> 'SiteRep' -> 'DV' -> 'dataDate' as "date",
"data" -> 'SiteRep' -> 'DV' -> 'type' as "type",
"location_entry"."value"
from foo,
lateral jsonb_array_elements(("data" -> 'SiteRep' -> 'DV' -> 'Location')::jsonb) location_entry --,
--lateral jsonb_array_elements("data" -> 'SiteRep' -> 'DV' -> 'Location' -> 'Period' ) period_entry
) "locations_entries"
order by "i"

Note the second lateral join is commented out. More on that in a second

The call above returns two rows:

    location_date      |   i    |   lat    |   lon    | p_value | type |                                value                                
------------------------+--------+----------+----------+---------+------+---------------------------------------------------------------------
"2014-09-23T20:00:00Z" | "3002" | "60.749" | "-0.854" |         |      | {"i": "3002", "lat": "60.749", "lon": "-0.854", "name": "BALTASO...
"2014-09-23T20:00:00Z" | "3005" | "60.139" | "-1.183" |         |      | {"i": "3005", "lat": "60.139", "lon": "-1.183", "name": "LERWICK...

Now if I uncomment the second lateral join it returns zero rows:

 location_date | i | lat | lon | value | type | value 
---------------+---+-----+-----+-------+------+-------
(0 rows)

Yet if I look at what the value of "data" -> 'SiteRep' -> 'DV' -> 'Location' -> 'Period' is to see what I'm calling jsonb_array_elements on it's:

   i    |                               Period                                
--------+---------------------------------------------------------------------
 "3002" | [{"Rep": [{"$": "1200", "D": "SSW", "H": "88.5"}, {"$": "1260", ...
 "3005" | [{"Rep": [{"$": "1200", "D": "SSW", "H": "95.3"}, {"$": "1260", ...

Which (I think) should be fine.

Any ideas what I'm doing wrong and how I can parse the multiple nested arrays into tables?

I tried here and here but can't apply these solutions to my json as they don't go deep enough or I'm misunderstanding my json.

Community
  • 1
  • 1
Montecristo
  • 103
  • 2
  • 8
  • The anatomy of the json can be found at the bottom of the page [here](http://www.metoffice.gov.uk/datapoint/product/uk-hourly-site-specific-observations/detailed-documentation) – Montecristo Sep 22 '15 at 16:10

0 Answers0