I've got a MySQL database with latin1
encoding, and I'm struggling with function SUBSTRING()
which is obviously counting bytes and not characters, as shown by the following scenario:
MySQL [hozana]> set names utf8;
Query OK, 0 rows affected (0.00 sec)
MySQL [hozana]> SELECT SUBSTRING('ééééé', 1, 3);
+-------------------------------+
| SUBSTRING('ééééé', 1, 3) |
+-------------------------------+
| ééé |
+-------------------------------+
Everything normal up to now, let's switch the connection to latin1
encoding.
MySQL [hozana]> set names latin1;
Query OK, 0 rows affected (0.00 sec)
MySQL [hozana]> SELECT SUBSTRING('ééééé', 1, 3);
+-------------------------------+
| SUBSTRING('ééééé', 1, 3) |
+-------------------------------+
| é� |
+-------------------------------+
The only way I found right now, is to convert string to utf-8 before function SUBSTRING() and convert it back to latin1 afterwards. Which is very ugly...
MySQL [hozana]> select convert(cast(convert(substring(convert(cast(convert('éééé' using latin1) as binary) using utf8), 1, 3) using utf8) as binary) using latin1);
+--------------------------------------------------------------------------------------------------------------------------------------------------+
| convert(cast(convert(substring(convert(cast(convert('éééé' using latin1) as binary) using utf8), 1, 3) using utf8) as binary) using latin1) |
+--------------------------------------------------------------------------------------------------------------------------------------------------+
| ééé |
+--------------------------------------------------------------------------------------------------------------------------------------------------+
My question is, which is the right configuration to make in order to have SUBSTRING()
working in latin1
?
Note
Here is the configuration before and after set names
:
MySQL [hozana]> SELECT VERSION();
+-----------+
| VERSION() |
+-----------+
| 5.5.54 |
+-----------+
MySQL [hozana]> set names utf8;
Query OK, 0 rows affected (0.00 sec)
MySQL [hozana]> SHOW SESSION VARIABLES LIKE 'character\_set\_%';
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
+--------------------------+--------+
MySQL [hozana]> set names latin1;
Query OK, 0 rows affected (0.00 sec)
MySQL [hozana]> SHOW SESSION VARIABLES LIKE 'character\_set\_%';
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
+--------------------------+--------+