This can be handled by using the MySQL collation system.
For example, the following query returns a true
(1
) value:
SELECT 'LÜTTGE' COLLATE utf8_general_ci = 'LUTTGE'
Accordingly, if you set the column's character set to utf8
and its collation to utf8_general_ci
you will get the result you mention with umlaut characters.
The default collation in MySQL reflects its Swedish origin and is utf8_swedish_ci. In Swedish, Ü and U are not the same letter. You probably have used the default collation for your columns.
The utf8_general_ci
collation handles matching 'Eßen' to 'Esen' but not to 'Essen'. It handles matching 'LÜTTGE' to 'LUTTGE' but not to 'Luettge', unfortunately.
On the other hand, the utf8_german2_ci
collation matches 'Eßen' to 'Essen' and 'LÜTTGE' to 'LUETTGE'. If your users are accustomed to using ASCII transliterations of German characters you may wish to explore your choices here. One of them is to use a query with OR
SELECT whatever
FROM table
WHERE ( namen COLLATE utf8_general_ci = 'LUTTGE'
OR namen COLLATE utf8_german2_ci = 'LUTTGE' )
It can get more complex if you need to handle Spanish, because Ñ
is considered a different letter from N
. You may need to do some explaining for your users.
Marcus suggested using the utf_unicode_ci
collation. That will handle things partially too. Here are the cases
type utf8_general_ci utf8_german2_ci utf8_unicode_ci utf8_spanish_ci
'Eßen' to 'Esen' substitute match no match no match no match
'Eßen' to 'Essen' transliterate no match match match match
'LÜTTGE' to 'LUTTGE' substitute match no match match match
'LÜTTGE' to 'LUETTGE' transliterate no match match no match no match
'Niño' to 'Nino' transliterate match match match no match
So you still need some extra work to handle transliterations.