3

I have a web application which uses mysql.I am using mysql 5.5 and my charset is UTF-8 and collation is utf8_general_ci.I have an issue where if I search for term like Jöhn in my web application I don't get any result,although there is an entry in database as Jöhn.The string is stored correctly in the database.I get the result if I search with 'o' instead of 'ö' (i.e)if I give search term as 'john' I get 'jöhn' in search result.I don't have any other entry like john.The only entry is using the german umlaut i.e jöhn.

S Shruthi
  • 35
  • 4
  • 1
    Possible duplicate of [What's the difference between utf8\_general\_ci and utf8\_unicode\_ci](http://stackoverflow.com/questions/766809/whats-the-difference-between-utf8-general-ci-and-utf8-unicode-ci) – Holger Just Sep 02 '16 at 09:46
  • Please had the `HEX()` of the various Johns. – Rick James Sep 06 '16 at 16:44

2 Answers2

1

Let me walk through the possible situations...

The single character ö (hex C3B6) is usually treated as the same as o in most collations, including utf8_general_ci. You can see this in

SELECT 'o' = 'ö' COLLATE utf8_general_ci;  --> 1 (meaning True)

See utf8 collations . 8 other collations (bin, danish, icelandic, swedish, estonian, turkish, hungarian, and german2) treat them differently.

There is at least one way to write ö as two characters: UNHEX('CC886F') is a 'combining diaeressis' plus 'o'.

SELECT UNHEX('CC886F') = 'o' COLLATE utf8_unicode_ci; --> 1
SELECT UNHEX('CC886F') = 'o' COLLATE utf8_general_ci; --> 0

Back to the question... If the ö is really two characters, then you do need utf8_unicode_ci. If you can get the HEX, check against the hex I listed above.

Probably the only way for "if I search for term like Jöhn in my web application I don't get any result,although there is an entry in database as Jöhn" to happen is the two renderings of the character.

(So I agree with Robin, but I hope this illuminates the "why".)

Rick James
  • 135,179
  • 13
  • 127
  • 222
0

Use collation utf8_unicode_ci. This should solve the problem.

rbr94
  • 2,227
  • 3
  • 23
  • 39