0

Hi I have one big problem with one MYSQL search. My database TABLE looks like this :

+------+----------+------+
|  id  | keywords | file |
+------+----------+------+

At keywords there are many keywords for each entry seperated with comas. (keyword1,keyword2...).

At PHP array there are listed some keywords (5-10). And my search must get all DB entries which got atleast 3 from those keywords. Its not required to got all of those words! But it can't work and with just one.

Can somebody help me with that query? I don't got good idea how to make it.

Svetoslav
  • 4,686
  • 2
  • 28
  • 43
  • So what does your OTHER table structure look like that you are searching against... or was this IT, and you are just using a string to compare against. If so, poor design on the keywords context. – DRapp Feb 04 '12 at 17:11
  • There isnt other table. I must search at field : Keywords. – Svetoslav Feb 04 '12 at 17:14
  • 4
    @Svetlio: Is the table structure your choice? How did you come with a comma separated values column? It would be good if you read about [normalization](http://en.wikipedia.org/wiki/Database_normalization) and the problems of storing comma separated values in relational tables: [Is storing a comma separated list in a database column really that bad?](http://stackoverflow.com/questions/3653462/is-storing-a-comma-separated-list-in-a-database-column-really-that-bad) – ypercubeᵀᴹ Feb 04 '12 at 17:34
  • @ypercube Excellent Link – Brian Hoover Feb 04 '12 at 17:38

2 Answers2

2

That's a challenge. The brute force method would be to use a UNION in a subquery with a count.

For example,

select id, file, count(*) from
  (select distinct id, file
     from file_table
       where FIND_IN_SET(keyword1, keywords)
   UNION ALL
   select distinct id, file
     from file_table
       where FIND_IN_SET(keyword2, keywords)
   UNION ALL
   select distinct id, file
     from file_table
       where FIND_IN_SET(keyword3, keywords)
   UNION ALL
   select distinct id, file
     from file_table
       where FIND_IN_SET(keyword4, keywords)
   .... MORE UNION ALL ....) as files
group by id, file
having count(*) >= 3

More efficiently, you could have a separate table with keywords and ID, one keyword/ID combo per row. This would eliminate the wildcard search and make the query more efficient.

The next step would be to go to something like ElasticSearch and filter on the score of the result.

Brian Hoover
  • 7,861
  • 2
  • 28
  • 41
  • So, if you search for keyword `man`, it will also count keywords `woman`, `manhood`, `Mane` and `many` more... – ypercubeᵀᴹ Feb 04 '12 at 17:21
  • Didn't say it was a great solution. Just that it would work. The solution to that would be to make sure that every keyword is suffixed with a ',' so the keywords might be 'woman,man,manhood,mane,' and then modify the query to put in the trailing comma each time. – Brian Hoover Feb 04 '12 at 17:28
  • Yes. You can also use the `FIND_IN_SET()` function to identify comma separated values. But **normalizing** the structure by making a many-to-many table as you suggested is the correct way to go. All the others are side-tricks. – ypercubeᵀᴹ Feb 04 '12 at 17:29
  • This search works perfect. :) I dont have problems with searching part of word or how words. – Svetoslav Feb 04 '12 at 17:44
  • @Svetlio: You can change the `keywords like '%keywordX%'` to `FIND_IN_SET(keywordX, keywords)` – ypercubeᵀᴹ Feb 04 '12 at 17:58
0

If you had this setup:

table files:

+------+-------+
| id   | file  |
+------+-------+
| 1000 | foo   |
| 1001 | bar   |
+------+-------+

table keywords:

+----+-------+
| id | word  |
+----+-------+
|  9 | lorem |
| 10 | ipsum |
+----+-------+

table filekeywords:

+----+--------+--------+
| id | fileid | wordid |
+----+--------+--------+
|  1 | 1000   | 9      |
|  2 | 1000   | 10     |
|  3 | 1001   | 10     |
+----+--------+--------+

You could find files with keywords lorem, ipsum, dolor like this:

SELECT COUNT(DISTINCT(k.word)), f.*
FROM files f
INNER JOIN filekeywords fk
  ON fk.fileid = f.id
INNER JOIN keywords k
  ON k.id = fk.wordid
WHERE k.word in ('lorem', 'ipsum', 'dolor')  
GROUP BY f.id
HAVING COUNT(DISTINCT(k.word)) >= 3
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677