Others have already encouraged you to normalize your design and there are numerous posts on why this is recommended. Using your current shared dataset, the following was done using postgres where the user_ids
was treated as a text array. I also tested with the user_ids
as text
as used cast(user_ids as text[])
to convert it to a text
array
See fiddle and result below:
Schema (PostgreSQL v11)
CREATE TABLE table_a (
"user_ids" text[],
"value" VARCHAR(7)
);
INSERT INTO table_a
("user_ids", "value")
VALUES
('{user_123, user_234}', 'apples'),
('{user_456, user_123}', 'oranges'),
('{user_234}', 'kiwi');
CREATE TABLE table_b (
"id" INTEGER,
"name" VARCHAR(10)
);
INSERT INTO table_b
("id", "name")
VALUES
('123', 'John Smith'),
('234', 'Jane Doe'),
('456', 'John Doe');
The first CTE user_values
creates a row for each user_id
and value. The second CTE merged_values
joins table_b
on the pattern user_<user_id>
if it exists and ensures unique results using DISTINCT
. The final projection groups based on values and users array_agg
to collect all user_ids
or names
into a single row.
Query #1
WITH user_values AS (
SELECT
unnest(a.user_ids) user_id,
a.value
FROM
table_a a
),
merged_values AS (
SELECT DISTINCT
a.user_id,
a.value,
b.name
FROM
user_values a
LEFT JOIN
table_b b ON a.user_id = CONCAT('user_',b.id)
)
SELECT
array_agg(user_id) user_ids,
value,
array_agg(name) "names"
FROM
merged_values
GROUP BY
value;
user_ids |
value |
names |
user_123,user_456 |
oranges |
John Smith,John Doe |
user_123,user_234 |
apples |
John Smith,Jane Doe |
user_234 |
kiwi |
Jane Doe |
View on DB Fiddle