Can anyone tell me ways to do this kind of search in a database?
I got these tables:
posts (id, tags_cache)
tags (id, name)
posts_tags (post_id, tag_id)
The user enters a search query (say "water blue") and I want to show the posts that have both tags. The only way I can think of to search is using FIND_IN_SET, this way:
SELECT p.*, GROUP_CONCAT(t.name) AS tags_search
FROM posts p
LEFT JOIN posts_tags pt ON p.id = pt.post_id
LEFT JOIN tags t ON pt.tag_id = t.id
GROUP BY p.id
HAVING FIND_IN_SET('water', tags_search) > 0
AND FIND_IN_SET('blue', tags_search) > 0
The posts.tags_cache
text column stores the names and id of the tags it belongs to (this way: water:15 blue:20
).
To avoid JOINs by using this column for search, I've tried LIKE and INSTR but these will give inexact results since you can search for "ter" and you'll gets posts tagged 'water' and 'termal' for example. I've also tried REGEXP which gives exact results, but it's a slow process.
I can't use MATCH as tables use InnoDB.
So... is or are there other ways to accomplish this?
[Edit]
I forgot to mention that the user could search for many tags (not just 2), and even exclude tags: search posts tagged 'water' but not 'blue'. With FIND_IN_SET this works for me:
HAVING FIND_IN_SET('water', tags_search) > 0
AND NOT FIND_IN_SET('blue', tags_search) > 0
[Edit2]
I did some performance test (i.e. only checked how long the queries took, cached) as ypercube suggested, and these are the results:
muists | Bill K | ypercu | includes:excludes
--------------------------
0.0137 | 0.0009 | 0.0029 | 2:0
0.0096 | 0.0081 | 0.0033 | 2:1
0.0111 | 0.0174 | 0.0033 | 2:2
0.0281 | 0.0081 | 0.0025 | 5:1
0.0014 | 0.0013 | 0.0015 | 0:2
I don't know if this info is valid resource... But it shows that ypercube's method with a JOIN per tag is the quickest.