First of all, here is your sample data loaded into MySQL 5.5.12 on my Windows7 Machine
mysql> DROP DATABASE IF EXISTS lspuk;
Query OK, 1 row affected (0.00 sec)
mysql> CREATE DATABASE lspuk;
Query OK, 1 row affected (0.00 sec)
mysql> USE lspuk
Database changed
mysql> CREATE TABLE items
-> (
-> id int not null auto_increment,
-> description VARCHAR(30),
-> tags VARCHAR(30),
-> primary key (id),
-> FULLTEXT tags_ftndx (tags)
-> ) ENGINE=MyISAM;
Query OK, 0 rows affected (0.04 sec)
mysql> INSERT INTO items (description,tags) VALUES
-> ('the first' ,'tag1 tag3 tag4'),
-> ('the second','tag5 tag1 tag2'),
-> ('the third' ,'tag5 tag1 tag9'),
-> ('the fourth','tag5 tag6 tag2'),
-> ('the fifth' ,'tag4 tag3 tag6'),
-> ('the sixth' ,'tag2 tag3 tag6');
Query OK, 6 rows affected (0.00 sec)
Records: 6 Duplicates: 0 Warnings: 0
mysql>
Please look at the way the tag population is happening in MySQL:
mysql> SELECT 'tag1',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag1%' UNION
-> SELECT 'tag2',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag2%' UNION
-> SELECT 'tag3',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag3%' UNION
-> SELECT 'tag4',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag4%' UNION
-> SELECT 'tag5',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag5%' UNION
-> SELECT 'tag6',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag6%' UNION
-> SELECT 'tag9',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag9%';
+------+-----------+
| tag1 | tag_count |
+------+-----------+
| tag1 | 3 |
| tag2 | 3 |
| tag3 | 3 |
| tag4 | 2 |
| tag5 | 3 |
| tag6 | 3 |
| tag9 | 1 |
+------+-----------+
7 rows in set (0.00 sec)
mysql>
Look carefully and please note the following facts:
- Each row has exactly 3 tags
- The order the tags are requested vs how many of each tag exists seems to govern the score
If you remove tag4 and run the query, you get no score at all
mysql> SELECT *,MATCH(tags) AGAINST ('tag3 tag6') as score FROM items ORDER BY score DESC;
+----+-------------+----------------+-------+
| id | description | tags | score |
+----+-------------+----------------+-------+
| 1 | the first | tag1 tag3 tag4 | 0 |
| 2 | the second | tag5 tag1 tag2 | 0 |
| 3 | the third | tag5 tag1 tag9 | 0 |
| 4 | the fourth | tag5 tag6 tag2 | 0 |
| 5 | the fifth | tag4 tag3 tag6 | 0 |
| 6 | the sixth | tag2 tag3 tag6 | 0 |
+----+-------------+----------------+-------+
6 rows in set (0.00 sec)
The evaluation method seems to be based on the average number of tokens field and presence and/or absence of specific values in a particular order affects scoring. If you apply different styles of scoring and tag specification, note the various scores:
mysql> SELECT *,MATCH(tags) AGAINST ('tag3 tag6 tag4') as score FROM items ORDER BY score DESC;
+----+-------------+----------------+--------------------+
| id | description | tags | score |
+----+-------------+----------------+--------------------+
| 1 | the first | tag1 tag3 tag4 | 0.6700310707092285 |
| 5 | the fifth | tag4 tag3 tag6 | 0.6700310707092285 |
| 2 | the second | tag5 tag1 tag2 | 0 |
| 3 | the third | tag5 tag1 tag9 | 0 |
| 4 | the fourth | tag5 tag6 tag2 | 0 |
| 6 | the sixth | tag2 tag3 tag6 | 0 |
+----+-------------+----------------+--------------------+
6 rows in set (0.00 sec)
mysql> SELECT *,MATCH(tags) AGAINST ('tag3 tag6 tag4' IN BOOLEAN MODE) as score FROM items ORDER BY score DESC;
+----+-------------+----------------+-------+
| id | description | tags | score |
+----+-------------+----------------+-------+
| 5 | the fifth | tag4 tag3 tag6 | 3 |
| 1 | the first | tag1 tag3 tag4 | 2 |
| 6 | the sixth | tag2 tag3 tag6 | 2 |
| 4 | the fourth | tag5 tag6 tag2 | 1 |
| 2 | the second | tag5 tag1 tag2 | 0 |
| 3 | the third | tag5 tag1 tag9 | 0 |
+----+-------------+----------------+-------+
6 rows in set (0.00 sec)
mysql> SELECT *,MATCH(tags) AGAINST ('+tag3 +tag6 +tag4' IN BOOLEAN MODE) as score FROM items ORDER BY score DESC;
+----+-------------+----------------+-------+
| id | description | tags | score |
+----+-------------+----------------+-------+
| 5 | the fifth | tag4 tag3 tag6 | 1 |
| 1 | the first | tag1 tag3 tag4 | 0 |
| 2 | the second | tag5 tag1 tag2 | 0 |
| 3 | the third | tag5 tag1 tag9 | 0 |
| 4 | the fourth | tag5 tag6 tag2 | 0 |
| 6 | the sixth | tag2 tag3 tag6 | 0 |
+----+-------------+----------------+-------+
6 rows in set (0.00 sec)
mysql>
The solution would seem to be evaluate a BOOLEAN MODE score, and then a non-BOOLEAN MODE score as follows:
SELECT *,
MATCH(tags) AGAINST ('tag3 tag6 tag4') as score1,
MATCH(tags) AGAINST ('+tag3 +tag6 +tag4' IN BOOLEAN MODE) as score2
FROM items ORDER BY score2 DESC, score1 DESC;
Here is the result against your sample data:
mysql> SELECT *,
-> MATCH(tags) AGAINST ('tag3 tag6 tag4') as score1,
-> MATCH(tags) AGAINST ('+tag3 +tag6 +tag4' IN BOOLEAN MODE) as score2
-> FROM items ORDER BY score2 DESC, score1 DESC;
+----+-------------+----------------+--------------------+--------+
| id | description | tags | score1 | score2 |
+----+-------------+----------------+--------------------+--------+
| 5 | the fifth | tag4 tag3 tag6 | 0.6700310707092285 | 1 |
| 1 | the first | tag1 tag3 tag4 | 0.6700310707092285 | 0 |
| 2 | the second | tag5 tag1 tag2 | 0 | 0 |
| 3 | the third | tag5 tag1 tag9 | 0 | 0 |
| 4 | the fourth | tag5 tag6 tag2 | 0 | 0 |
| 6 | the sixth | tag2 tag3 tag6 | 0 | 0 |
+----+-------------+----------------+--------------------+--------+
6 rows in set (0.00 sec)
mysql>
or you could try not using the plus signs
mysql> SELECT *,
-> MATCH(tags) AGAINST ('tag3 tag6 tag4') as score1,
-> MATCH(tags) AGAINST ('tag3 tag6 tag4' IN BOOLEAN MODE) as score2
-> FROM items ORDER BY score2 DESC, score1 DESC;
+----+-------------+----------------+--------------------+--------+
| id | description | tags | score1 | score2 |
+----+-------------+----------------+--------------------+--------+
| 5 | the fifth | tag4 tag3 tag6 | 0.6700310707092285 | 3 |
| 1 | the first | tag1 tag3 tag4 | 0.6700310707092285 | 2 |
| 6 | the sixth | tag2 tag3 tag6 | 0 | 2 |
| 4 | the fourth | tag5 tag6 tag2 | 0 | 1 |
| 2 | the second | tag5 tag1 tag2 | 0 | 0 |
| 3 | the third | tag5 tag1 tag9 | 0 | 0 |
+----+-------------+----------------+--------------------+--------+
6 rows in set (0.00 sec)
mysql>
Either way, you will have to incorporate BOOLEAN MODE and non-BOOLEAN mode at the same time.