1

The table has an id and step_data columns.

id | step_data
--------------
a1 | {...}
a2 | {...}
a3 | {...}

Where step_data is nested structure as follows, where the actual keys of the metadata can be in of the events objects.

{
  "events": [
    {
      "timestamp": "2021-04-07T17:46:13.739Z",
      "meta": [
        {
          "key": "length",
          "value": "0.898"
        },
        {
          "key": "height",
          "value": "607023104"
        },
        {
          "key": "weight",
          "value": "33509376"
        }
      ]
    },
    {
      "timestamp": "2021-04-07T17:46:13.781Z",
      "meta": [
        {
          "key": "color",
          "value": "0.007"
        },
        {
          "key": "count",
          "value": "641511424"
        },
        {
          "key": "age",
          "value": "0"
        }
      ]
    }
  ]
}

I can extract one field like length pretty easily.

select cast(metadata ->> 'value' as double precision) as length,
       id
from (
         select jsonb_array_elements(jsonb_array_elements(step_data #> '{events}') #> '{meta}') metadata,
                id
         from table
     ) as parsed_keys
where metadata @> '{
  "key": "length"
}'::jsonb
id length
a1 0.898
a2 0.800

But what I really need is to extract the metadata as columns from a couple of known keys, like length and color. Not sure how to get another column efficiently once I split the array with jsonb_array_elements().

Is there an efficient way to do this without having to call jsonb_array_elements() again and do a join on every single one? For example such that the result set looks like this.

id length color weight
a1 0.898 0.007 33509376
a2 0.800 1.000 15812391

Using Postgres 11.7.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
mvd
  • 2,596
  • 2
  • 33
  • 47
  • What if the keys in the `meta` array elements aren't unique? E.g. you have `color` in multiple of them? Something like this: https://pastebin.com/m5Lbn3c6 –  Apr 26 '21 at 19:14
  • @a_horse_with_no_name Using PG 11.7 btw. And query can assume there is only one key overall per JSON structure. If there are more than one then having query fail fast would be great. – mvd Apr 26 '21 at 19:32

3 Answers3

1

This query is an option:

SELECT id,
  MAX(CASE WHEN metadata->>'key' = 'length' THEN  metadata->>'value' END) AS length,
  MAX(CASE WHEN metadata->>'key' = 'color' THEN  metadata->>'value' END) AS color,
  MAX(CASE WHEN metadata->>'key' = 'weight' THEN  metadata->>'value' END) AS weight

FROM (SELECT id, jsonb_array_elements(jsonb_array_elements(step_data #> '{events}') #> '{meta}') as metadata 
      FROM table t) AS aux
   
GROUP BY id;
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
nachospiu
  • 2,009
  • 2
  • 8
  • 12
1

With Postgres 11, I can only think of unnesting both levels, then aggregating back into a key/value pair from which you can extract the desired keys:

select t.id, 
       (x.att ->> 'length')::numeric as length,
       (x.att ->> 'color')::numeric as color,
       (x.att ->> 'weight')::numeric as weight
from the_table t
  cross join lateral (
     select jsonb_object_agg(m.item ->> 'key', m.item -> 'value') as att
     from jsonb_array_elements(t.step_data -> 'events') as e(event)
      cross join jsonb_array_elements(e.event -> 'meta') as m(item)
     where m.item ->> 'key' in ('color', 'length', 'weight')
  ) x
;  

With Postgres 12 you could write it a bit simpler:

select t.id, 
       jsonb_path_query_first(t.step_data, '$.events[*].meta[*] ? (@.key == "length").value') #>> '{}' as length,
       jsonb_path_query_first(t.step_data, '$.events[*].meta[*] ? (@.key == "color").value') #>> '{}' as color,
       jsonb_path_query_first(t.step_data, '$.events[*].meta[*] ? (@.key == "weight").value') #>> '{}' as weight
from the_table t
;  
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
1

crosstab()

For any Postgres version.

You could feed the result into the crosstab() function to pivot the result. You need the additional module tablefunc installed. If you are unfamiliar, read basic instructions here first:

SELECT *
FROM   crosstab(
   $$
   SELECT id, metadata->>'key', metadata ->>'value'
   FROM  (SELECT id, jsonb_array_elements(jsonb_array_elements(step_data -> 'events') -> 'meta') metadata FROM tbl) AS parsed_keys
   ORDER  BY 1
   $$
 , $$VALUES ('length'), ('color'), ('weight')$$
   ) AS ct (id text, length float, color float, weight float);

db<>fiddle here (demonstrating all)

Should deliver best performance, especially for many columns.

Note how we need no explicit cast to double precision (float). crosstab() processes text input anyway, the result is coerced to the types given in the column definition list.

If one of the keys should appear multiple times, the last row wins. (No error is raised, like you would seem to prefer.) You can add a deterministic sort order to the query in $1 to sort the preferred row last. Example: to get the lowest value per key:

ORDER BY 1, 2, 3 DESC

Conditional aggregates with FILTER clause

For Postgres 9.4 or newer.

See:

SELECT id
     , min(val) FILTER (WHERE key = 'length') AS length
     , min(val) FILTER (WHERE key = 'color') AS color
     , min(val) FILTER (WHERE key = 'weight') AS weight
FROM (
   SELECT id, metadata->>'key' AS key, (metadata ->>'value')::float AS val
   FROM  (SELECT id, jsonb_array_elements(jsonb_array_elements(step_data -> 'events') -> 'meta') metadata FROM tbl) AS parsed_keys
   ) sub
GROUP  BY id;

In Postgres 12 or later I would compare performance with jsonb_path_query_first(). a_horse provided a solution.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228