Here is my table structure:
// posts
+----+-----------+---------------------+-------------+
| id | title | body | keywords |
+----+-----------+---------------------+-------------+
| 1 | title1 | Something here | php,oop |
| 2 | title2 | Something else | html,css,js |
+----+-----------+---------------------+-------------+
// tags
+----+----------+
| id | name |
+----+----------+
| 1 | php |
| 2 | oop |
| 3 | html |
| 4 | css |
| 5 | js |
+----+----------+
// pivot
+---------+--------+
| post_id | tag_id |
+---------+--------+
| 1 | 1 |
| 1 | 2 |
| 2 | 3 |
| 2 | 4 |
| 2 | 5 |
+---------+--------+
As you see, I store keywords in two ways. Both as string into a column named keywords
and as relational into other tables.
Now I need to select all posts that have specific keywords (for example php
and html
tags). I can do that in two ways:
1: Using unnormalized design:
SELECT * FROM posts WHERE keywords REGEXP 'php|html';
2: Using normalized design:
SELECT posts.id, posts.title, posts.body, posts.keywords
FROM posts
INNER JOIN pivot ON pivot.post_id = posts.id
INNER JOIN tags ON tags.id = pivot.tag_id
WHERE tags.name IN ('html', 'php')
GROUP BY posts.id
See? The second approach uses two JOIN
s. I guess it will be slower than using REGEXP
in huge dataset.
What do you think? I mean what's your recommendation and why?