I need some help parsing json so I can perform analysis on it. It's remote station measurement data that comes with location.
I've got to answer two simple questions and Ive got so far but it's giving me a brain block trying to figure some of the behaviour.
Nested arrays (four deep) is the main issue. I can't work out why lateral joins on jsonb_array_elements calls aren't doing the job.
I'm using postgres 9.4.4
Sample json is below as is the pg/sql I'm trying it with.
Here's the sample json. Anatomy/spec of the data and a pretty version can be found below:
{ "SiteRep": { "Wx": { "Param": [ { "name": "D", "units": "compass", "$": "Wind Direction" }, { "name": "H", "units": "%", "$": "Screen Relative Humidity" } ] }, "DV": { "dataDate": "2014-09-23T20:00:00Z", "type": "O", "Location": [ { "i": "3002", "lat": "60.749", "lon": "-0.854", "name": "BALTASOUND", "country": "SCOTLAND", "continent": "EUROPE", "elevation": "15.0", "Period": [ { "type": "Day", "value": "2014-09-22Z", "Rep": [ { "D": "SSW", "H": "88.5", "$": "1200" }, { "D": "S", "H": "91.6", "$": "1260" }, { "D": "SSW", "H": "92.8", "$": "1320" }, { "D": "W", "H": "92.8", "$": "1380" } ] }, { "type": "Day", "value": "2014-09-23Z", "Rep": [ { "D": "W", "H": "88.5", "$": "0" }, { "D": "W", "H": "86.7", "$": "60" }, { "D": "W", "H": "86.2", "$": "120" }, { "D": "WSW", "H": "87.9", "$": "180" } ] } ] }, { "i": "3005", "lat": "60.139", "lon": "-1.183", "name": "LERWICK (S. SCREEN)", "country": "SCOTLAND", "continent": "EUROPE", "elevation": "82.0", "Period": [ { "type": "Day", "value": "2014-09-22Z", "Rep": [ { "D": "SSW", "H": "95.3", "$": "1200" }, { "D": "SSW", "H": "97.4", "$": "1260" }, { "D": "SW", "H": "97.3", "$": "1320" }, { "D": "W", "H": "94.1", "$": "1380" } ] }, { "type": "Day", "value": "2014-09-23Z", "Rep": [ { "D": "WNW", "H": "89.1", "$": "0" }, { "D": "WNW", "H": "88.0", "$": "60" }, { "D": "W", "H": "90.9", "$": "120" }, { "D": "W", "H": "90.9", "$": "180" } ] } ] } ] } } }
Which I'm loading in with this code:
drop table if exists foo;
create table public.foo (data jsonb);
truncate foo;
copy foo from <example json as above>
DELIMITERS e'\x02';
First specific task is to count the number of "D" keys which have value "S".
Second task is to group these by different values of "$" key.
So I'm just trying to parse the json into pg tables with this:
select
"locations_entries"."date" as "location_date",
"locations_entries"."value" -> 'i' as "i",
"locations_entries"."value" -> 'lat' as "lat",
"locations_entries"."value" -> 'lon' as "lon",
-- "locations_entries"."value" -> 'Period' as "Period",
"locations_entries"."value" -> 'Period' -> 'value' as "p_value",
"locations_entries"."value" -> 'Period' -> 'type' as "type",
left("locations_entries"."value"::text,64) || '...' as "value"
from
(
select "data" -> 'SiteRep' -> 'DV' -> 'dataDate' as "date",
"data" -> 'SiteRep' -> 'DV' -> 'type' as "type",
"location_entry"."value"
from foo,
lateral jsonb_array_elements(("data" -> 'SiteRep' -> 'DV' -> 'Location')::jsonb) location_entry --,
--lateral jsonb_array_elements("data" -> 'SiteRep' -> 'DV' -> 'Location' -> 'Period' ) period_entry
) "locations_entries"
order by "i"
Note the second lateral join is commented out. More on that in a second
The call above returns two rows:
location_date | i | lat | lon | p_value | type | value
------------------------+--------+----------+----------+---------+------+---------------------------------------------------------------------
"2014-09-23T20:00:00Z" | "3002" | "60.749" | "-0.854" | | | {"i": "3002", "lat": "60.749", "lon": "-0.854", "name": "BALTASO...
"2014-09-23T20:00:00Z" | "3005" | "60.139" | "-1.183" | | | {"i": "3005", "lat": "60.139", "lon": "-1.183", "name": "LERWICK...
Now if I uncomment the second lateral join it returns zero rows:
location_date | i | lat | lon | value | type | value
---------------+---+-----+-----+-------+------+-------
(0 rows)
Yet if I look at what the value of "data" -> 'SiteRep' -> 'DV' -> 'Location' -> 'Period' is to see what I'm calling jsonb_array_elements on it's:
i | Period
--------+---------------------------------------------------------------------
"3002" | [{"Rep": [{"$": "1200", "D": "SSW", "H": "88.5"}, {"$": "1260", ...
"3005" | [{"Rep": [{"$": "1200", "D": "SSW", "H": "95.3"}, {"$": "1260", ...
Which (I think) should be fine.
Any ideas what I'm doing wrong and how I can parse the multiple nested arrays into tables?
I tried here and here but can't apply these solutions to my json as they don't go deep enough or I'm misunderstanding my json.