Postgres - Loop through a string entry and create join table

Question

I have two tables like so:

books:

id | tags
---+---------------------------------------------------
1  | [philosophy.plato]
2  | [literature.history]
3  | [cultural_history.18th_century.16th_century.history]

tags:

id | name
---+---------------------------------------------------------
1  | literature
2  | history
3  | philosophy
4  | plato

And I am trying to create a join table to cross refer the items in books with their tags. Something like this....

books_tags:

book_id | tag_id
--------+---------------------------------------------------------
   1    | 3
   1    | 4
   2    | 1
   2    | 2

How can I take the books.tags string and convert it into an array, look up each item in the array, and insert it into the join table?

So far I have:

SELECT distinct(s.tag)
FROM books t, unnest(string_to_array(trim (t.tags, '[]'), '.')) s(tag);

Which splits in the string into an array, but how do I loop through each item and insert it into the join table?

what is your desire output? Insert? where? – Juan Carlos Oropeza Jan 23 '16 at 16:35 — Juan Carlos Oropeza, Jan 23 '16 at 16:35
I want to create the table books_tags as shown above. – Stormdamage Jan 23 '16 at 16:42 — Stormdamage, Jan 23 '16 at 16:42

score 2 · Answer 1 · answered Jan 23 '16 at 19:35

You can achieve this with such a query:

WITH books(id, tags) AS (
  VALUES (1::int4, 'philosophy.plato'::text),
    (2, 'literature.history'),
    (3, 'cultural_history.18th_century.16th_century.history')
), tags (id, "name") AS (
  VALUES (1, 'literature'),
    (2, 'history'),
    (3, 'philosophy'),
    (4, 'plato')
)
SELECT b.id book_id,
       t.id tag_id
  FROM (
    SELECT id, regexp_split_to_table(tags, E'\\.') tag
      FROM books) b
  JOIN tags t ON t."name"=b.tag;

Some notes:

It is not good to use name for the column — this is a reserved word. If you still wish to keep, it is better to use double quotes around it.
My WITH constructs match your tables and can be skipped in your case

score 1 · Accepted Answer · edited May 23 '17 at 11:52

Combining unnest() with string_to_array() (like you already had) is typically considerably faster than regexp_split_to_table(), because regular expressions are expensive. Compare:

Error while using regexp_split_to_table (Amazon Redshift)

I suggest a LATERAL join:

What is the difference between LATERAL and a subquery in PostgreSQL?

CREATE TABLE books_tags AS
SELECT b.id AS book_id, t.tag_id
FROM   books b, unnest(string_to_array(trim(b.tags, '[]'), '.')) x(tag)
JOIN   tags  t ON t.name = x.tag
-- GROUP BY 1, 2  -- only if there can be duplicates
-- ORDER BY 1, 2; -- optional, but probably good for performance

"name" is not a reserved word - but still a very bad choice for a column name because it's not descriptive. I would simply use "tag" for the name of a tag.

You probably want to add a unique constraint and foreign keys to complete the many-to-many relationship. More:

How to implement a many-to-many relationship in PostgreSQL?

Postgres - Loop through a string entry and create join table

2 Answers2