-1

I have two tables, Table A and Table B that I am trying to extract information from and create a resulting table.

Table A

user_ids               value

{user_123, user_234}   apples
{user_456, user_123}   oranges
{user_234}             kiwi
Table B

id          name

123         John Smith
234         Jane Doe
456         John Doe

I want to join the two tables in a way that will result in the following:

Table C

user_ids               value    user_names

{user_123, user_234}   apples   {John Smith, Jane Doe}
{user_456, user_123}   oranges  {John Doe, John Smith}
{user_234}             kiwi     {Jane Doe}

Any help would be really appreciated!

sticky bit
  • 36,626
  • 12
  • 31
  • 42
mlenthusiast
  • 1,094
  • 1
  • 12
  • 34
  • Please normalize your database design at least to 3NF. Your design breaks 1NF. – The Impaler May 10 '21 at 23:34
  • Only tag the *one* DBMS you're really using. – sticky bit May 10 '21 at 23:41
  • And besides that, as @TheImpaler already wrote (though you already violate the 1st NF), normalize your schema. See ["Is storing a delimited list in a database column really that bad?"](https://stackoverflow.com/questions/3653462/is-storing-a-delimited-list-in-a-database-column-really-that-bad) (Spoiler: Yes it is.). – sticky bit May 10 '21 at 23:43
  • @stickybit The `user_ids` is probably an array column so it isn't as bad as the "CSV in a column" anti pattern. – mu is too short May 11 '21 at 00:16
  • @muistooshort: It's still a violation of 1st NF unless the array is an atomic unit as far as the DBMS is concerned. But it's not, as we can clearly see. So it's still bad (enough), (simple) foreign key constraints aren't possible and it makes querying the data more complicated than it needs to be. – sticky bit May 11 '21 at 00:23

1 Answers1

2

Others have already encouraged you to normalize your design and there are numerous posts on why this is recommended. Using your current shared dataset, the following was done using postgres where the user_ids was treated as a text array. I also tested with the user_ids as text as used cast(user_ids as text[]) to convert it to a text array

See fiddle and result below:

Schema (PostgreSQL v11)

CREATE TABLE table_a (
  "user_ids" text[],
  "value" VARCHAR(7)
);

INSERT INTO table_a
  ("user_ids", "value")
VALUES
  ('{user_123, user_234}', 'apples'),
  ('{user_456, user_123}', 'oranges'),
  ('{user_234}', 'kiwi');
  


CREATE TABLE table_b (
  "id" INTEGER,
  "name" VARCHAR(10)
);

INSERT INTO table_b
  ("id", "name")
VALUES
  ('123', 'John Smith'),
  ('234', 'Jane Doe'),
  ('456', 'John Doe');

The first CTE user_values creates a row for each user_id and value. The second CTE merged_values joins table_b on the pattern user_<user_id> if it exists and ensures unique results using DISTINCT. The final projection groups based on values and users array_agg to collect all user_ids or names into a single row.

Query #1

WITH user_values AS (
SELECT
   unnest(a.user_ids) user_id,
   a.value
FROM 
   table_a a
),
merged_values AS (
SELECT DISTINCT
   a.user_id,
   a.value,
   b.name
FROM
   user_values a
LEFT JOIN
   table_b b ON a.user_id = CONCAT('user_',b.id)
)
SELECT
    array_agg(user_id) user_ids,
    value,
    array_agg(name) "names"
FROM
    merged_values
GROUP BY
    value;
user_ids value names
user_123,user_456 oranges John Smith,John Doe
user_123,user_234 apples John Smith,Jane Doe
user_234 kiwi Jane Doe

View on DB Fiddle

ggordon
  • 9,790
  • 2
  • 14
  • 27