I have a web application which uses mysql.I am using mysql 5.5 and my charset is UTF-8 and collation is utf8_general_ci.I have an issue where if I search for term like Jöhn in my web application I don't get any result,although there is an entry in database as Jöhn.The string is stored correctly in the database.I get the result if I search with 'o' instead of 'ö' (i.e)if I give search term as 'john' I get 'jöhn' in search result.I don't have any other entry like john.The only entry is using the german umlaut i.e jöhn.
-
1Possible duplicate of [What's the difference between utf8\_general\_ci and utf8\_unicode\_ci](http://stackoverflow.com/questions/766809/whats-the-difference-between-utf8-general-ci-and-utf8-unicode-ci) – Holger Just Sep 02 '16 at 09:46
-
Please had the `HEX()` of the various Johns. – Rick James Sep 06 '16 at 16:44
2 Answers
Let me walk through the possible situations...
The single character ö
(hex C3B6
) is usually treated as the same as o
in most collations, including utf8_general_ci
. You can see this in
SELECT 'o' = 'ö' COLLATE utf8_general_ci; --> 1 (meaning True)
See utf8 collations . 8 other collations (bin, danish, icelandic, swedish, estonian, turkish, hungarian, and german2) treat them differently.
There is at least one way to write ö
as two characters: UNHEX('CC886F')
is a 'combining diaeressis' plus 'o'.
SELECT UNHEX('CC886F') = 'o' COLLATE utf8_unicode_ci; --> 1
SELECT UNHEX('CC886F') = 'o' COLLATE utf8_general_ci; --> 0
Back to the question... If the ö
is really two characters, then you do need utf8_unicode_ci. If you can get the HEX, check against the hex I listed above.
Probably the only way for "if I search for term like Jöhn in my web application I don't get any result,although there is an entry in database as Jöhn" to happen is the two renderings of the character.
(So I agree with Robin, but I hope this illuminates the "why".)

- 135,179
- 13
- 127
- 222