1

MySQL 5.6. I can't get a string constant within a view to populate correctly against a database with default UCS2 character set. Works fine on 5.7.

I've created a minimally reproducible example, below.

DROP SCHEMA IF EXISTS test3;
CREATE SCHEMA test3 CHARACTER SET ucs2;
CONNECT test3;

CREATE TABLE testtable (
testname VARCHAR(15)
);

INSERT INTO testTable( testname ) VALUES ('foo');
INSERT INTO testTable( testname ) VALUES ('bar');

CREATE OR REPLACE VIEW testview AS
SELECT * FROM testtable
WHERE testname = 'foo';

SELECT * FROM testview;

^^^ This select statement returns no results.

MySQL [test3]> show create view testview \G
*************************** 1. row ***************************
                View: testview
         Create View: CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost`
SQL SECURITY DEFINER VIEW `testview` AS select `testtable`.`testname` AS 
`testname` from `testtable` where (`testtable`.`testname` = '\0\0\0f\0\0\0o\0\0\0o')
character_set_client: utf8
collation_connection: utf8_general_ci

What is that, utf32??

The following does work, but I don't want to write the collation directly into the statement, as this needs to be portable code and the syntax looks non-standard:

CREATE OR REPLACE VIEW testview AS
SELECT * FROM testtable
WHERE testname = 'foo' COLLATE utf8_general_ci;

I have tried setting the client, connection, and server character sets to ucs2 and utf16 but this changed nothing. Likewise with the collations to *_general_ci.

Any ideas?

Edit:

MySQL [test3]> show variables like "char%";
+--------------------------+------------------------------------------------------------+
| Variable_name            | Value                                                      |
+--------------------------+------------------------------------------------------------+
| character_set_client     | utf8                                                       |
| character_set_connection | utf8                                                       |
| character_set_database   | ucs2                                                       |
| character_set_filesystem | binary                                                     |
| character_set_results    | utf8                                                       |
| character_set_server     | latin1                                                     |
| character_set_system     | utf8                                                       |
| character_sets_dir       | C:\Program Files\MySQL\mysql-5.6.36-winx64\share\charsets\ |
+--------------------------+------------------------------------------------------------+
flatline
  • 42,083
  • 4
  • 31
  • 38

1 Answers1

3

There is essentially no reason to ever use usc2 or utf16 or utf32 in MySQL tables. Use utf8mb4 only. (Or utf8 if you have an old version of MySQL.)

Please provide SHOW VARIABLES LIKE "char%"; Certain things should not be changed:

mysql> SHOW VARIABLES LIKE "char%";
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8mb4                    |
| character_set_connection | utf8mb4                    |
| character_set_database   | utf8mb4                    |
| character_set_filesystem | binary                     | <--
| character_set_results    | utf8mb4                    |
| character_set_server     | utf8mb4                    |
| character_set_system     | utf8                       | <--
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

When you created the view, you did not set the charset. I can see that from your SHOW when it said:

character_set_client: utf8
Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Thanks. See my edit; I can switch the _server value to ucs2 but it balks trying to set the _client value; but I don't necessarily see any problem with that. Why utf8mb4 over utf32? Either one is 4 bytes per character. ucs2 gives a guaranteed 2 bytes per character. I figured that surrogate pairs would just not be handled properly or could be addressed at application or client level by re-encoding. – flatline Sep 25 '17 at 15:22
  • 1
    `utf32` is 4 bytes for _ever_ character, even English letters. `utf8mb4` (`UTF-8` to the outside world) is variable length, ` byte for English letters, 2 for European accented letters, 3 for Asian charaters, 4 for Emoji and some Chinese. – Rick James Sep 25 '17 at 19:42
  • Furthermore, `utf8mb4` is evolving as _the_ standard. – Rick James Sep 25 '17 at 19:42
  • `ucs2` is a different hack that is uninteresting. And you have to use two of them for certain things -- anything that needs 4 bytes in UTF-8. – Rick James Sep 25 '17 at 19:44
  • Indeed, I had originally thought that ucs2 would buy me something in terms of table size over utf8mb4, but after revisiting it I was incorrect, with the possible exception of char fields but I can live with that. I'm switching to utf8mb4. I am going to leave the issue open for now as the particular problem as written appears to be a bug in 5.6, and is confusing at best.Thanks for the tips! – flatline Sep 25 '17 at 22:29