define similar characters in mysql

Question

in mysql the following query will return 1

select "Khur Ramābād"="Khur Ramabad"

so there is no difference between a and ā in mysql. but ascii code of a=97 and ascii code of ā=196.

is it possible to define same thing for other languages too? for example in Persian "ی" and "ي" are same characters with different character codes and some keyboard layouts use "ی" and some other use "ي".

I know I can control it on my application side and replace all ي with ی in there but this doesn't works for all characters. for example "آ" and ا are same but they can't be replaced by each other.

my problem isn't character set. I already using utf8. as I said in latin text if you search for "Khur Ramabad" and your field contains "Khur Ramābād" mysql will return the result but in Persian if you have "نارنجی" and search for "نارنجي" it will not return any thing because 'ی' and 'ي' have to different character codes.

my problem isn't character set. I already using utf8. as I said in latin text if you search for "Khur Ramabad" and your field contains "Khur Ramābād" mysql will return the result but in Persian if you have "نارنجی" and search for "نارنجي" it will not return any thing because 'ی' and 'ي' have two different character codes.

don't use ascii then. use unicode, which is designed for these kinds of things. — Marc B, Dec 22 '14 at 15:14
@Flimzy sorry my bad explanation. I already using utf8, I wanted to say 'ی' and 'ي' have two different character codes, so in search compare the result will be 0 not 1. — Soheil, Dec 22 '14 at 15:44

score 2 · Accepted Answer · answered Dec 22 '14 at 15:44

The question you ask is about collation. MySQL considers your two roman-character names equal because it is using COLLATE utf8_general_ci (or maybe utf8_swedish_ci; MySQL's roots are in Sweden) as its default collation. That collation considers the accented and unaccented characters to be equal.

Unfortunately, the utf8_persian_ci collation doesn't seem to consider the two alternative characters you have shown to be equal.

See here for a bit of SQL that compares various versions of text strings and shows whether they are equal or not. http://sqlfiddle.com/#!8/eeff3/1/0

I am ignorant of Persian, so I can't tell whether that means the collation is defective. But it might be. If so, you should submit a defect report to https://bugs.mysql.com/

score 1 · Answer 2 · edited May 23 '17 at 12:07

1

Yep. That has to do with the characterset type of the field and the collation type of the table. These two things help set the equivalency tables when doing indexing and querying.

I know that we're supposed to copy the relevant document in to answers, but the documentation is quite extensive, depending on your case. For Persian, try this:

MySQL charset: UTF-8 Unicode (utf8)
MySQL connection collation: utf8_general_ci
your database and table collations are set to: utf8_general_ci or utf8_unicode_ci

http://dev.mysql.com/doc/refman/5.0/en/charset.html

http://forums.mysql.com/read.php?103,209072,209072

Which of utf8 collations is the best?

edited May 23 '17 at 12:07

Community

1
1

answered Dec 22 '14 at 15:17

PaulProgrammer

16,175
4
39
56

unfortunately neither of these queries reruns 1, `select "ی"="ي" collate utf8_persian_ci` and `select "ی"="ي" collate utf8_general_ci` and `select "ی"="ي" collate utf8_unicode_ci` – Soheil Dec 22 '14 at 15:40
Yeah, @OllieJones is right -- it might just be that nobody in the mysql development team knows enough about Persian to set up the equivalency tables. – PaulProgrammer Dec 22 '14 at 15:47

define similar characters in mysql

2 Answers2