I've read quite a few similar posts but none solves my case, which could well be because of my lack of sufficient knowledge, so please bear with me.
One of the search options in my terminological dictionary is "whole words only". At first I was using
WHERE ".$source." RLIKE '[[:<:]]".$keyword."[[:>:]]'
However, this failed to match whole words for the first or second $keyword
when there is more than one. Then I found
WHERE ".$source." REGEXP '[[:<:]]".$keyword."[[:>:]]'
and
WHERE ".$source." REGEXP '(^| )".$keyword."( |$)'
while searching these forums
I just tested both of the above in my PhpMyAdmin and found out that the former executes in 0.0740 seconds, while the latter takes twice as long, 0.1440 seconds, so I guess I should stick with the former.
What bothers me the most is the huge discrepancy in results, e.g. searching for a single word ("tool"):
Using the
[[:<:]]
and[[:>:]]
word boundary in PhpMyAdmin returns 34 results.Using
(^| )
and( |$)
in PhpMyAdmin returns 26 results.Running the #1 regexp in my PHP script returns 34 results (this is the correct number).
Here's the whole MySQL block:
foreach($keywords as $keyword) {
$query = $db->query("SELECT * FROM ".DICTIONARY_TABLE." " .
"JOIN ".DICTIONARY_THEMES." ON ".DICTIONARY_TABLE.".theme_id = ".DICTIONARY_THEMES.".theme_id ".
"LEFT JOIN ".DICTIONARY_DEFINITIONS." ON ".DICTIONARY_TABLE.".term_id = ".DICTIONARY_DEFINITIONS.".term_id ".
"WHERE ".DICTIONARY_TABLE.".".$source." REGEXP '(^| )".$keyword."( |$)'".
//"WHERE ".DICTIONARY_TABLE.".".$source." REGEXP '[[:<:]]".$keyword."[[:>:]]'".
" ORDER BY ".DICTIONARY_TABLE.".theme_id, ".DICTIONARY_TABLE.".".$source."");
}
I've commented out the search option I'm not using.
Now, if I try TWO keywords, e.g. "cutting tool", I still get 34 results in the page. I'm unsure if I'm doing this right in PhpMyAdmin:
SELECT * FROM `asphodel_dictionary_terms` WHERE english REGEXP '[[:<:]]cutting[[:>:]]';
SELECT * FROM `asphodel_dictionary_terms` WHERE english REGEXP '[[:<:]]tool[[:>:]]'
This returns 44 results for "cutting" and 34 results for "tool". The query using (^| )
... returns 37 + 26 results respectively.
Any feedback that would help me sort things out would be appreciated!
The database contains a total of 109,000 entries in the main table, there are 82 themes in the DICTIONARY_THEMES
table and 727 entries in the DICTIONARY_DEFINITIONS
table. Not a huge database and it won't grow much bigger.