5

I have a table with 1/2 million phrases and I am doing word matching using this query:

SELECT * FROM `searchIndex` WHERE `indexData` RLIKE '[[:<:]]Hirt'

The indexData field has a FULLTEXT index and is datatype longtext.

I want to match on items like

"Alois M. Hirt"

"Show Biz - Al Hirt, in a new role, ..."

"Al Hirt's Sinatraville open 9 p..."

"Hirt will be playing..."

and not on "shirt" or "thirteen" or "thirty" etc.

The query is succeeding but it frequently takes 3 seconds to return and I wondered if there was a better, more efficient way of doing this word boundary match?

If I were to add another index to indexData what would be the correct keylength to use?

TIA

jerrygarciuh
  • 21,158
  • 26
  • 82
  • 139
  • 3
    try to use FULLTEXT search first follow by the regex – ajreal Aug 31 '11 at 18:20
  • 2
    SELECT * FROM `searchIndex` WHERE MATCH (`indexData`) AGAINST ('Hirt*' IN BOOLEAN MODE); – georgepsarakis May 13 '12 at 06:32
  • Thanks setsuna! That client now has an in-house developer. I have passed this along to him. – jerrygarciuh May 13 '12 at 19:33
  • @jerrygarciuh if this question has been answered, then you should either give an answer your own, or delete it altogether. – Paolo Stefan Sep 20 '12 at 09:57
  • 1
    I had similar problem, but in my case phrases could be also multiword. Adding a like check (... LIKE '%phrase%' AND ...) before regexp check increased speed by much (so regexp was checked only when there was phrase in article at all). – Zbyszek Jul 18 '14 at 13:55

2 Answers2

6

No need to have a FULLTEXT index. MySQL has special markers for word boundaries. From the MySQL doc:

[[:<:]], [[:>:]]

These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_).

mysql> SELECT 'a word a' REGEXP '[[:<:]]word[[:>:]]';   -> 1
mysql> SELECT 'a xword a' REGEXP '[[:<:]]word[[:>:]]';  -> 0
Renaud
  • 16,073
  • 6
  • 81
  • 79
1

setsuna's answer worked very well:

SELECT * FROM searchIndex WHERE MATCH (indexData) AGAINST ('Hirt*' IN BOOLEAN MODE);

Community
  • 1
  • 1
jerrygarciuh
  • 21,158
  • 26
  • 82
  • 139