MySQL: 'group by' without losing marks

Question

I have a table like this:

name
Smith
Smith
Perez
Pérez

I would like to eliminate duplicates like Smith but preserve both Perez and Pérez (e and é). If I use 'group by' I get two rows (Smith and one of the two Perez/Pérez), but I would like to get three rows (Smith, Perez, Pérez). It happens the same with Sjögren and Sjogren, etc. Thanks

My table is in US English but contains many foreign data as well (and from many different countries) — FP Towers, Mar 13 '14 at 12:52
Here is an answer very helpful regarding sorting and charset in MySQL: http://stackoverflow.com/questions/766809/whats-the-difference-between-utf8-general-ci-and-utf8-unicode-ci — Daniel W., Mar 13 '14 at 14:25

score 1 · Answer 1 · answered Mar 13 '14 at 11:41

1

1)First check your table if it has utf8 charset encoding with

select table_name,engine 
from information_schema.tables
where table_schema = 'your_database'

2)Secondly , if it is not than (else skip to 3rd step), ALTER your table (utf8 character set encoding, so it will support special character)

ALTER TABLE `name` CHARACTER SET utf8;

3) SELECT from your db with utf8 charset

select * from your_table group by name collate utf8_general_ci

answered Mar 13 '14 at 11:41

Dimag Kharab

4,439
1
24
45

so u made your table to utf charset 8 ? – Dimag Kharab Mar 13 '14 at 12:57
`utf8_general_ci` does not distinct between e and é. You need `unicode` support. – Daniel W. Mar 13 '14 at 14:05
@DanFromGermany you mean utf8mb4_unicode_ci ? what you commented in other answer ? – Dimag Kharab Mar 13 '14 at 14:11
@CodingAnt I mean at least use `utf8_unicode_ci`, not `utf8_general_ci` – Daniel W. Mar 13 '14 at 14:24

score 1 · Answer 2 · edited Mar 13 '14 at 14:04

1

Try using utf8_unicode_ci rather than utf8_general_ci - it uses a more accurate comparison algorithm.

edited Mar 13 '14 at 14:04

Edper

9,144
1
27
46

answered Mar 13 '14 at 14:02

Paul J

476
2
8

I tried using utf8_unicode_ci and still not differentiating between e and é. Then I tried using utf8mb4_unicode and I got the same result: Perez and Pérez are still considered the same. – FP Towers Mar 13 '14 at 16:43

MySQL: 'group by' without losing marks

2 Answers2