1

OK so I checked and it doesn't seem someone asked this question.

So I have two words:

thiep cuoi
thiệp cưới

The problem is when I COUNT() these mysql will combine these two as the same. For instance this sql:

#lets assume these two words have an id of 1 and 2 and that the column name
#in the table is `word`

SELECT `word`, COUNT(`word`)
FROM table_name 
WHERE `id` IN(1,2)
GROUP BY `word`;

Will return the two words as one row with a count of 2. These are not the same words via UTF-8, how can I bypass this behavior in MySQL? Doesn't MySQL group by UTF-8 and not convert to ASCII? : /

Paul Carlton
  • 2,785
  • 2
  • 24
  • 42

1 Answers1

3

MySQL uses the collation you set on the column to determine equality of letters in words. In whatever collation you've set on this column, those letters are considered equal for the purposes of comparison. MySQL is not doing any kind of conversion or throwing away any data.

http://dev.mysql.com/doc/refman/5.5/en/charset-general.html

You're probably using something common like latin1_general_ci or utf8_general_ci. If you want those letters treated as different then you probably want a binary collation. Run a query like SHOW COLLATION LIKE 'utf8%' to see what's available on your server.

Dan Grossman
  • 51,866
  • 10
  • 112
  • 101