I have a table of the following format:
Table "my_table"
Column | Type | Collation | Nullable |
-----------------------+-----------------------------+-----------+----------+
foreign_key_column_id | integer | | not null |
column1_value_1 | character varying | | not null |
column1_value_2 | character varying | | |
column1_value_3 | character varying | | |
There can be multiple rows with same foreign_key_column_id
.
My goal is to merge all rows of same foreign_key_column_id
into the following format:
foreign_key_column_id
column1::jsonb
So that the resulting array is a combination of distinct non-null values of the three columns. (Then I will insert the resulting dataset into a new table)
I have achieved it by the following, but I am sure this is resource-inefficient (too many functions) and because the table I will run it on has close to 100 million rows, I have to care about performance.
SELECT
foreign_key_column_id,
array_to_json(
array_remove(
array(
SELECT DISTINCT unnest(
array_agg(
array[
column1_value_1,
column1_value_2,
column1_value_3
]
)
)
),
NULL
)
) AS column1
FROM my_table
GROUP BY foreign_key_column_id
My actual, complete situation:
-- initial table
TABLE public.my_table (
id integer NOT NULL,
foreign_key_column_id integer NOT NULL,
column1_value_1 character varying NOT NULL,
column1_value_2 character varying,
column1_value_3 character varying,
column2_value_1 character varying NOT NULL,
column2_value_2 character varying,
column2_value_3 character varying,
column3_value_1 character varying(90) NOT NULL,
column3_value_2 character varying(90),
column3_value_3 character varying(90)
)
-- desired table into which I will insert the aggregated select
TABLE public.my_new_table (
id bigint NOT NULL,
foreign_key_column_id bigint NOT NULL,
headlines jsonb NOT NULL,
descriptions jsonb NOT NULL
);
-- sample data
-[ RECORD 1 ]-------+--
id | 111
foreign_key_column_id | 12
column1_value_1 | Foo
column1_value_2 |
column1_value_3 |
column2_value_1 | Foo %{1}
column2_value_2 |
column2_value_3 | Foo %{3}
column3_value_1 | Foo : Bar
column3_value_2 |
column3_value_3 |
-[ RECORD 2 ]-------+--
id | 222
foreign_key_column_id | 12
column1_value_1 | Bar %{1}
column1_value_3 |
column1_value_2 | Bar %{2}
column2_value_1 | Bar
column2_value_2 |
column2_value_3 |
column3_value_1 | Qux
column3_value_2 |
column3_value_3 |
-[ RECORD 3 ]-------+--
id | 33
foreign_key_column_id | 1234
column1_value_1 | Foo
column1_value_2 |
column1_value_3 |
column2_value_1 | Baz %{1}
column2_value_3 |
column2_value_2 | Baz %{2}
column3_value_1 |
column3_value_2 | Hello world
column3_value_3 |
-- expected output
-[ RECORD 1 ]-------+--
id | 1
foreign_key_column_id | 12
column1 | ["Foo", "Bar %{1}", "Bar %{2}"]
column2 | ["Foo %{1}", "Foo %{3}", "Bar"]
column3 | ["Foo : Bar ", "Qux"]
-[ RECORD 2 ]-------+--
id | 2
foreign_key_column_id | 1234
column1 | ["Foo"]
column2 | ["Baz %{1}", "Baz %{2}"]
column3 | ["Hello world"]