1

I have two tables like so:

books:

id | tags
---+---------------------------------------------------
1  | [philosophy.plato]
2  | [literature.history]
3  | [cultural_history.18th_century.16th_century.history]

tags:

id | name
---+---------------------------------------------------------
1  | literature
2  | history
3  | philosophy
4  | plato

And I am trying to create a join table to cross refer the items in books with their tags. Something like this....

books_tags:

book_id | tag_id
--------+---------------------------------------------------------
   1    | 3
   1    | 4
   2    | 1
   2    | 2

How can I take the books.tags string and convert it into an array, look up each item in the array, and insert it into the join table?

So far I have:

SELECT distinct(s.tag)
FROM books t, unnest(string_to_array(trim (t.tags, '[]'), '.')) s(tag);

Which splits in the string into an array, but how do I loop through each item and insert it into the join table?

Stormdamage
  • 389
  • 1
  • 5
  • 16

2 Answers2

2

You can achieve this with such a query:

WITH books(id, tags) AS (
  VALUES (1::int4, 'philosophy.plato'::text),
    (2, 'literature.history'),
    (3, 'cultural_history.18th_century.16th_century.history')
), tags (id, "name") AS (
  VALUES (1, 'literature'),
    (2, 'history'),
    (3, 'philosophy'),
    (4, 'plato')
)
SELECT b.id book_id,
       t.id tag_id
  FROM (
    SELECT id, regexp_split_to_table(tags, E'\\.') tag
      FROM books) b
  JOIN tags t ON t."name"=b.tag;

Some notes:

  1. It is not good to use name for the column — this is a reserved word. If you still wish to keep, it is better to use double quotes around it.
  2. My WITH constructs match your tables and can be skipped in your case
vyegorov
  • 21,787
  • 7
  • 59
  • 73
1

Combining unnest() with string_to_array() (like you already had) is typically considerably faster than regexp_split_to_table(), because regular expressions are expensive. Compare:

I suggest a LATERAL join:

CREATE TABLE books_tags AS
SELECT b.id AS book_id, t.tag_id
FROM   books b, unnest(string_to_array(trim(b.tags, '[]'), '.')) x(tag)
JOIN   tags  t ON t.name = x.tag
-- GROUP BY 1, 2  -- only if there can be duplicates
-- ORDER BY 1, 2; -- optional, but probably good for performance

"name" is not a reserved word - but still a very bad choice for a column name because it's not descriptive. I would simply use "tag" for the name of a tag.

You probably want to add a unique constraint and foreign keys to complete the many-to-many relationship. More:

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228