35

These two querys gives me the exact same result:

select * from topics where name='Harligt';
select * from topics where name='Härligt';

How is this possible? Seems like mysql translates åäö to aao when it searches. Is there some way to turn this off?

I use utf-8 encoding everywhere as far as i know. The same problem occurs both from terminal and from php.

Josh Lee
  • 171,072
  • 38
  • 269
  • 275
Martin
  • 5,197
  • 11
  • 45
  • 60

5 Answers5

39

Yes, this is standard behaviour in the non-language-specific unicode collations.

9.1.13.1. Unicode Character Sets

To further illustrate, the following equalities hold in both utf8_general_ci and utf8_unicode_ci (for the effect this has in comparisons or when doing searches, see Section 9.1.7.7, “Examples of the Effect of Collation”):

Ä = A Ö = O Ü = U

See also Examples of the effect of collation

You need to either

  • use a collation that doesn't have this "feature" (namely utf8_bin, but that has other consequences)

  • use a different collation for the query only. This should work:

     select * from topics where name='Harligt' COLLATE utf8_bin;
    

it becomes more difficult if you want to do a case insensitive LIKE but not have the Ä = A umlaut conversion. I know no mySQL collation that is case insensitive and does not do this kind of implicit umlaut conversion. If anybody knows one, I'd be interested to hear about it.

Related:

Community
  • 1
  • 1
Pekka
  • 442,112
  • 142
  • 972
  • 1,088
5

Since you are in Sweden I'd recommend using the Swedish collation. Here's an example showing the difference it makes:

CREATE TABLE topics (name varchar(100) not null) CHARACTER SET utf8;

INSERT topics (name) VALUES ('Härligt');

select * from topics where name='Harligt';
'Härligt'

select * from topics where name='Härligt';
'Härligt'    

ALTER TABLE topics MODIFY name VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_swedish_ci;

select * from topics where name='Harligt';
<no results>

select * from topics where name='Härligt';
'Härligt'

Note that in this example I only changed the one column to Swedish collation, but you should probably do it for your entire database, all tables, all varchar columns.

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • This should be working for other languages than swedish so i think utf8_bin is the collation i want, but thanks for letting me know that i can change collation on only one column, that will be really helpful. – Martin Apr 09 '10 at 12:27
  • Note that utf8_bin is case sensitive so "härligt" != "Härligt" (applies to unique indexes too). – serbaut Apr 09 '10 at 13:23
2

While collations are one way of solving this, the much more straightforward way seems to me to be the BINARY keyword:

 SELECT 'a' = 'ä', BINARY 'a' = 'ä'

will return 1|0

In your case:

SELECT * FROM topics WHERE BINARY name='Härligt';

See also https://www.w3schools.com/sql/func_mysql_binary.asp

E. Villiger
  • 876
  • 10
  • 27
1

you want to check your collation settings, collation is the property that sets which characters are identical.

these 2 pages should help you

http://dev.mysql.com/doc/refman/5.1/en/charset-general.html

http://dev.mysql.com/doc/refman/5.1/en/charset-mysql.html

chris
  • 9,745
  • 1
  • 27
  • 27
0

Here you can see some collation charts. http://collation-charts.org/mysql60/. I'm no sure which is the used utf8_general_ci though.

Here is the chart for utf8_swedish_ci. It shows which characters it interprets as the same. http://collation-charts.org/mysql60/mysql604.utf8_swedish_ci.html

jiv-e
  • 483
  • 6
  • 8