3

I have a table with the character set latin1 (checked by show variables like "character_set_database";) and a default collation of latin1_swedish_ci (checked by SHOW TABLE STATUS;).

I'd like to run a query using the collation latin1_general_cs, which is compiled on my system (checked by Show collation LIKE "%_cs";):

select * from myTab WHERE col RLIKE '[[:upper:]]' COLLATE 'latin1_general_cs' LIMIT 10;

which gives an error:

ERROR 1253 (42000): COLLATION 'latin1_general_cs' is not valid for CHARACTER SET 'utf8'

Even the default collation does not work:

select * from myTab WHERE col RLIKE '[[:upper:]]' COLLATE 'latin1_swedish_ci' LIMIT 10;

error:

ERROR 1253 (42000): COLLATION 'latin1_swedish_ci' is not valid for CHARACTER SET 'utf8'

Whats wrong? is it because my terminal sends data in UTF-8? (Is the "Connection in UTF-8"?) How can I anyway use case sensitive collation? I need it, otherwise it seems impossible to check for uppercase letters.

R_User
  • 10,682
  • 25
  • 79
  • 120

2 Answers2

3

You may find this interesting.

Differences between utf8 and latin1

I can't say I'm an expert in this area, but as far as I can see, UTF8 and Latin1 encode characters differently. So trying to collate UTF8 using Latin1 doesn't make sense.

http://dev.mysql.com/doc/refman/5.0/en/charset-applications.html

Whereas:

CREATE DATABASE mydb
DEFAULT CHARACTER SET utf8
DEFAULT COLLATE utf8_general_ci;

is consistent.

Update

Sorry, I just re-read your question and you say that the table is using latin1. Odd. Have you tried using the utf8 collations?

Community
  • 1
  • 1
Jaydee
  • 4,138
  • 1
  • 19
  • 20
  • 1
    `utf8_general_cs` is not installed/compiled. I have no idea how to install it. But before it would be great to know why it doesn't work. There might be also a way to convert my data, or use another charset when starting mySQL. Now I'm trying to convert the table itself to a the collation `latin1_general_cs` using `alter table myTab convert to character set latin1 collate latin1_general_cs;`. Takes a while, maybe because all indices have to be rewritten – R_User May 15 '13 at 10:05
  • 1
    It seems that `SELECT * FROM myTab WHERE col RLIKE CONVERT(_utf8'[[:upper:]]' USING 'latin1') COLLATE 'latin1_general_cs' LIMIT 10;` works. Probably it is because the connection to mySQL is in UTF8. But it seems that nobody knows,... I'm also connecting to MySQL using R - I have noIdea which charset is used for the connection. I always thought that MySQL does the work and converts automatically. – R_User May 15 '13 at 10:20
0

I think that problem is somewhere else. RLIKE is regexp function COLLATE … have no effect here. Take a look ->

#1 - false
SELECT 'koníček' = 'konicek' COLLATE utf8_czech_ci;

#2 - false
SELECT 'koníček' LIKE 'konicek' COLLATE utf8_czech_ci;

#3 - false
SELECT 'koníček' RLIKE 'konicek' COLLATE utf8_czech_ci;

#4 - true
SELECT 'koníček' = 'konicek' COLLATE utf8_general_ci;

#5 - true
SELECT 'koníček' LIKE 'konicek' COLLATE utf8_general_ci;

#6 - false
SELECT 'koníček' RLIKE 'konicek' COLLATE utf8_general_ci;
iiic
  • 1,366
  • 2
  • 15
  • 23