84

I would like to write an SQL query that searches for a keyword in a text field, but only if it is a "whole word match" (e.g. when I search for "rid", it should not match "arid", but it should match "a rid".

I am using MySQL.

Fortunately, performance is not critical in this application, and the database size and string size are both comfortably small, but I would prefer to do it in the SQL than in the PHP driving it.

mike nelson
  • 21,218
  • 14
  • 66
  • 75
Oddthinking
  • 24,359
  • 19
  • 83
  • 121

6 Answers6

164

You can use REGEXP and the [[:<:]] and [[:>:]] word-boundary markers:

SELECT *
FROM table 
WHERE keywords REGEXP '[[:<:]]rid[[:>:]]'

Update for 2020: (actually 2018+)

MySQL updated its RegExp-Engine in version 8.0.4, so you will now need to use the "standard" word boundary marker \b:

SELECT *
FROM table 
WHERE keywords REGEXP '\\brid\\b'

Also be aware that you need to escape the backslash by putting a second backslash.

BlaM
  • 28,465
  • 32
  • 91
  • 105
LukeH
  • 263,068
  • 57
  • 365
  • 409
  • 3
    Just a note, strings using regular expression special characters must be escaped. – Kenston Choi Jul 11 '12 at 13:50
  • 1
    But another issue for the word boundary setup is that it may treat periods as word boundaries, so if you intend to match names, then it may not work as expected. select 'R.C. Sproul' regexp 'R\.C\.'; /*Returns 1*/... select 'R.C. Sproul' regexp '[[:<:]]R\.C\.[[:>:]]' /*Returns 0*/ – Kenston Choi Jul 11 '12 at 13:52
  • 1
    @LukeH - Thanks man. This is awesome. and I have used RLIKE is there any difference between in both REGEX vs RLIKE. – Shail Paras Sep 28 '14 at 06:27
  • 1
    Was useful for me. – Xcoder Mar 22 '16 at 08:06
  • This is what I want. Its saves my day. I have used '[[:<:]]rid' like this to check each words exact starting match. Like full text index. – Muthu17 Feb 19 '18 at 06:04
  • 3
    And just a note for using a php variable in your mysql query: `'[[:<:]]" . $rid . "[[:>:]]'` – stackunderflow May 01 '19 at 20:22
  • hello, please, the update for 2020 is not working in my case, but the `[[:>:]]` approach IS working, I'm using **Server version: 10.4.6-MariaDB mariadb.org binary distribution**. should I update something in order for `\\b` to work?? thanks – Scaramouche Oct 09 '20 at 17:40
  • Be aware that this introduces a bug if you use **accented words** in the database, because REGEXP does bitwise comparisson. So if the user searchs for "coracao", and your record is "coração" (or vice-versa), regexp won't find this item. – Alexandre T. Dec 24 '21 at 13:16
34

Found an answer to prevent the classic word boundary [[::<::]] clashing with special characters eg .@#$%^&*

Replace..

SELECT *
FROM table 
WHERE keywords REGEXP '[[:<:]]rid[[:>:]]'

With this..

SELECT *
FROM table 
WHERE keywords REGEXP '([[:blank:][:punct:]]|^)rid([[:blank:][:punct:]]|$)'

The latter matches (space, tab, etc) || (comma, bracket etc) || start/end of line. A more 'finished' word boundary match.

Ricky Boyce
  • 1,772
  • 21
  • 26
  • This code doesn't work for me. I get: `Error: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''([[[:blank:][:punct:]]' at line 3 SQLState: 42000 ErrorCode: 1064` Any idéas? – Solver42 Sep 14 '16 at 11:28
  • @Solver42 I just retested the above query with mysql 5.6.21 and it works as normal. Try removing the first bracket from `'([[[` – Ricky Boyce Sep 16 '16 at 01:06
  • I tried that but got the same Error. However this did the trick: ([[:blank:]]|[[:punct:]]|^)rid([[:punct:]}|[[:blank:]]|$) – Solver42 Sep 16 '16 at 07:33
4

Use regexp with word boundaries, but if you want also accent insensitive search, please note that REGEXP is a single-byte operator, so it is Worth nothing to have utf8_general_ci collation, the match will not be accent insensitive.

To have both accent insensitive and whole word match, specify the word written in the same way the (deprecated) PHP function sql_regcase() did.

In fact:

  • utf8_general_ci allows you to make an equality (WHERE field = value) case and accent insensitive search but it doesn't allow you to specify an entire word match (word boundaries markers not recognized)

  • LIKE allows you case and accent insensitive search but you have to manually specify all combinations of possible word boundaries charactes (word boundaries markers not recognized)

  • word boundaries [[:<:]] and [[:>:]] are supported in REGEXP, who is a single byte functions so don't perform accent insensitive search.

The solution is to use REGEXP with word boundaries and the word modified in the way sql_regcase does.

Used on http://www.nonsolodiete.it

Revious
  • 7,816
  • 31
  • 98
  • 147
Marco Marsala
  • 2,332
  • 5
  • 25
  • 39
4

You can use like with the wildcard marker to catch the possibilities (at start, at end, in middle, and alone), something like this should suffice:

select blah blah blah where column like 'rid %' or column like '% rid' or column like '% rid %' or column = 'rid'

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • Depending on the situation, you should also be careful of punctuation. For instance, none of those would return 'rid.' – Greg Leaver Mar 18 '09 at 04:29
  • 4
    I think the query is not sufficient. What about some text like "rid," or "(rid)"? – wenqiang May 25 '10 at 09:01
  • Good answer, good observation, simple solution: you can tailor the query with custom lines to meet the needs of your data. For example, add more lines like: `or column like '% rid, %' or column like 'rid, %'`. Or use the regex method above. – stackunderflow May 01 '19 at 20:24
1
select * from table where Locate('rid ', FieldToSearch) > 0 
      or Locate(' rid', FieldToSearch) > 0

This will handle finding rid where it is preceded or followed by a space, you could extend the approach to take account of .,?! and so on, not elegant but easy.

MrTelly
  • 14,657
  • 1
  • 48
  • 81
0

This is the best answer I've come up myself with so far:

SELECT * FROM table 
WHERE keywords REGEXP '^rid[ $]' OR keywords REGEXP ' rid[ $]'

I would have simplified it to:

SELECT *
FROM table
WHERE keywords REGEXP '[^ ]rid[ $]'

but [^ ] has a special meaning of "NOT a space", rather than "line-beginning or space".

How does REGEXP compare to multiple LIKE conditions? (Not that performance matters in this app.)

Oddthinking
  • 24,359
  • 19
  • 83
  • 121