27

Is there any way to make search query accent insensitive?

the column's and table's collation are utf8_polish_ci and I don't want to change them.

example word : toruń

select * from pages where title like '%torun%'

It doesn't find "toruń". How can I do that?

Okan Kocyigit
  • 13,203
  • 18
  • 70
  • 129

3 Answers3

34

You can change the collation at runtime in the sql query,

...where title like '%torun%' collate utf8_unicode_ci 

but beware that changing the collation on the fly at runtime forgoes the possibility of mysql using an index, so performance on large tables may be terrible.

Or, you can copy the column to another column, such as searchable_title, but change the collation on it. It's actually common to do this type of stuff, where you copy data but have it in some slightly different form that's optimized for some specific workload/purpose. You can use triggers as a nice way to keep the duplicated columns in sync. This method has the potential to perform well, if indexed.

Note - Make sure that your db really has those characters and not html entities. Also, the character set of your connection matters. The above assumes it's set to utf8, for example, via set names like set names utf8

If not, you need an introducer for the literal value

...where title like _utf8'%torun%' collate utf8_unicode_ci 

and of course, the value in the single quotes must actually be utf8 encoded, even if the rest of the sql query isn't.

goat
  • 31,486
  • 7
  • 73
  • 96
  • 1
    Do you have any idea why it doesn't work for Ł (polish L with accent - like in city Łódź)? `...where city like _utf8'%lodz%' collate utf8_general_ci` – suz Jan 08 '16 at 16:17
  • @suz - `Ł` collates after `Z` in utf8_general_ci; equal to `L` for utf8_unicode_520_ci; between `L` and `M` for other collations. MySQL 8.0 has newer collations: http://mysql.rjweb.org/utf8mb4_collations.html – Rick James Oct 10 '19 at 21:39
  • 1
    UPDATE: `utf8_general_ci` is no longer recommended. See [discussion of collations](https://stackoverflow.com/a/766996/199364) for modern alternatives. In hindsight, even when this answer was written `utf8_unicode_ci` was preferable to `utf8_general_ci`. So on an older database, not converted to utf8mb4, `utf8_unicode_ci` would be a better choice, IMHO. – ToolmakerSteve Nov 08 '19 at 10:51
8

This wont work in extreme circumstances, but try to change the column collation to UFT8 utf8_unicode_ci. Then accented characters will be equal to their non-accented counterparts.

Kieran
  • 2,554
  • 3
  • 26
  • 38
1

You could try SOUNDEX:

http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_soundex

This compares two string by how they sound. But this obviously delivers many more results.

Remy
  • 12,555
  • 14
  • 64
  • 104