3

I'm having a difficult time sorting a char field in MySQL. The problem is that accented characters get mixed up with un-accented characters. For example:

Abc
Ábd
Acc

I thought it may have something to do with collation. So I changed the collation of my table to utf8-ut8_bin, after reading this post. Actually, I altered the table several times to various collations. No cigar.

I should also add that, I don't mind the order of the sort as long as the sort doesn't result in a mixed list. In other words, this is fine:

Ábd
Abc
Acc

and so is this:

Abc
Acc
Ábd

Looking forward to your response.

Community
  • 1
  • 1
hba
  • 7,406
  • 10
  • 63
  • 105

2 Answers2

2

You just need to use a case-sensitive collation, for example: utf8_general_cs.

UPD

I am sorry, it seems there is no utf8_general_cs, utf8_bin should work though.

And you should change the collation of the specific field instead of that of the table (or be sure that the field does use the table defaults).

mysql> SELECT * FROM (
    -> SELECT 'A' as l
    -> UNION ALL
    -> SELECT 'á' as l
    -> UNION ALL
    -> SELECT 'A' as l) ls
    -> ORDER BY l;
+----+
| l  |
+----+
| A  |
| á  |
| A  |
+----+
3 rows in set (0.00 sec)

mysql> SELECT * FROM (
    -> SELECT 'A' as l
    -> UNION ALL
    -> SELECT 'á' as l
    -> UNION ALL
    -> SELECT 'A' as l) ls
    -> ORDER BY l COLLATE utf8_bin;
+----+
| l  |
+----+
| A  |
| A  |
| á  |
+----+
3 rows in set (0.00 sec)
Community
  • 1
  • 1
newtover
  • 31,286
  • 11
  • 84
  • 89
  • Thank you for the detailed response. I had altered the schema and the table to collation utf8_bin and still it wouldn't work. Then I ran your query and got an error. Looked at the column and the column had the default collation. I altered the column to utf8_bin then it worked. Thanks again. – hba Dec 13 '11 at 20:46
  • This approach fails to account for _equality_. – Rick James Jun 06 '17 at 20:18
0

@newtower had a good starting point, but neither he, nor the 'Community' realized that it was inconclusive. So I am providing an answer that should 'fix' it:

unicode_ci (and virtually all other collations):

SET NAMES utf8 COLLATE utf8_unicode_ci;

SELECT GROUP_CONCAT(l SEPARATOR '=') AS gc
  FROM (
    SELECT 'A' as l  UNION ALL
    SELECT 'á' as l  UNION ALL
    SELECT 'A' as l              ) ls
  GROUP BY l
  ORDER BY gc;

+--------+
| gc     |
+--------+
| A=á=A  |
+--------+

bin:

SET NAMES utf8 COLLATE utf8_bin;

SELECT GROUP_CONCAT(l SEPARATOR '=') AS gc
  FROM (
    SELECT 'A' as l  UNION ALL
    SELECT 'á' as l  UNION ALL
    SELECT 'A' as l              ) ls
  GROUP BY l
  ORDER BY gc;

+------+
| gc   |
+------+
| A=A  |
| á    |
+------+

(You could add a DISTINCT in the GROUP_CONCAT to avoid the dup A.)

And here is a full rundown of the utf8 collations (using that technique): http://mysql.rjweb.org/utf8_collations.html

Suggest you click "Affects Me" on https://bugs.mysql.com/bug.php?id=58797

Rick James
  • 135,179
  • 13
  • 127
  • 222