-1

I'm a working on a system where we have arbitrary incoming data stored into a table structure that basically is a key-value based storage.

The table structure in reality are a bit more complex but for the sake of solving the problem I will use a simplified example:

CREATE TABLE records (id BIGINT, type TEXT, identifier TEXT);
CREATE TABLE record_items (id BIGINT, rec_id BIGINT, key TEXT, value TEXT);

INSERT INTO records VALUES (10, 'PERSON', 'P12823');
INSERT INTO record_items VALUES 
(11, 10, 'FIRST_NAME', 'James'),
(12, 10, 'LAST_NAME', 'Burwell'),
(13, 10, 'AGE', '76'),
(14, 10, 'HEIGHT', '5-9'),
(15, 10, 'WEIGHT', '149'),
(16, 10, 'BLOOD', 'A+');

INSERT INTO records VALUES (20, 'PERSON', 'P83494');
INSERT INTO record_items VALUES 
(21, 20, 'FIRST_NAME', 'David'),
(22, 20, 'LAST_NAME', 'McDonnell'),
(23, 20, 'AGE', '47'),
(24, 20, 'COMPANY', 'Super Saver Foods');

INSERT INTO records VALUES (30, 'PERSON', 'P81323');
INSERT INTO record_items VALUES 
(31, 30, 'FIRST_NAME', 'William'),
(32, 30, 'LAST_NAME', 'Nelson'),
(32, 30, 'HEIGHT', '5-6'),
(33, 30, 'VEHICLE', '2005 Dodge Charger');

As you can see, incoming data have arbitrary keys. There are some keys that are present for all incoming data and then there are some that changes from record to record and some that are missing:

SELECT r.id, r.type, r.identifier, ri.key, ri.value
  FROM records r
  JOIN record_items ri ON ri.rec_id = r.id
 ORDER BY r.id, ri.key

id | type   | ident  | key        | value
-------------------------------------------
10 | PERSON | P12823 | AGE        | 76
10 | PERSON | P12823 | BLOOD      | A+
10 | PERSON | P12823 | FIRST_NAME | James
10 | PERSON | P12823 | HEIGHT     | 5-9
10 | PERSON | P12823 | LAST_NAME  | Burwell
10 | PERSON | P12823 | WEIGHT     | 149
20 | PERSON | P83494 | AGE        | 47
20 | PERSON | P83494 | COMPANY    | Flix
20 | PERSON | P83494 | FIRST_NAME | David
20 | PERSON | P83494 | LAST_NAME  | Donnell
30 | PERSON | P81323 | FIRST_NAME | William
30 | PERSON | P81323 | HEIGHT     | 5-6
30 | PERSON | P81323 | LAST_NAME  | Nelson
30 | PERSON | P81323 | VEHICLE    | Dodge

What we need to get, however, is all these arbitrary rows transposed to columns (order of columns is not important):

id | identifier | first_name | last_name | age | height | weight | company | vehicle
-----------------------------------------------------------------------------------
10 | P12823     | James      | Burwell   | 76  | 5-9    | 149    |         |
20 | P83494     | David      | Donnell   | 47  |        |        | Flix    |
30 | P81323     | William    | Nelson    |     | 5-6    |        |         | Dodge

I am aware of the existence of the crosstab function, however I was not able to make it to work as we need. The main problem is that, as far as I understand, crosstab expects that all records have exactly the same keys in the same order and any missing or arbitrary keys mixes up the resultset. Another problem with crosstab is that I need to explicitly define what are the columns for the resultset but what we need is to be able to call the function in a manner similar to this:

SELECT * 
  FROM some_crosstab_like_function('PERSON') -- PERSON identifies record type

I am thinking of creating some mechanism that consists of functions and temporary tables or views that are created on the fly and return the necessary data. However, before proceeding with this I wanted to know is there any other more easier solution to my problem.

We are PostgreSQL 10.5.

Any ideas will be much appreciated.

1 Answers1

0

Create a temporary table with aggregated data in jsonb objects:

create temp table json_data as
select id, identifier, jsonb_object_agg(lower(key), value) as data
from (
    select r.id, r.type, r.identifier, ri.key, ri.value
    from records r
    join record_items ri on ri.rec_id = r.id
    ) s
group by 1, 2;

The table contains data in this format:

select * 
from json_data;

 id | identifier |                                                     data                                                      
----+------------+---------------------------------------------------------------------------------------------------------------
 10 | P12823     | {"age": "76", "blood": "A+", "height": "5-9", "weight": "149", "last_name": "Burwell", "first_name": "James"}
 20 | P83494     | {"age": "47", "company": "Super Saver Foods", "last_name": "McDonnell", "first_name": "David"}
 30 | P81323     | {"height": "5-6", "vehicle": "2005 Dodge Charger", "last_name": "Nelson", "first_name": "William"}
(3 rows)

Use the function described in Flatten aggregated key/value pairs from a JSONB field

select create_jsonb_flat_view('json_data', 'id, identifier', 'data');

select *
from json_data_view

 id | identifier | age | blood |      company      | first_name | height | last_name |      vehicle       | weight 
----+------------+-----+-------+-------------------+------------+--------+-----------+--------------------+--------
 10 | P12823     | 76  | A+    |                   | James      | 5-9    | Burwell   |                    | 149
 20 | P83494     | 47  |       | Super Saver Foods | David      |        | McDonnell |                    | 
 30 | P81323     |     |       |                   | William    | 5-6    | Nelson    | 2005 Dodge Charger | 
(3 rows)

Note: you can (and probably should) modify the function to fit your needs, in particular you may want to create temporary view instead of the regular one.

klin
  • 112,967
  • 15
  • 204
  • 232