0

I setup an InnoDB in mysql 5.7.26. The characterset is utf8mb4, but now i found out some wrong behaviour when filtering by umlauts.

Setup:

mysql> SHOW VARIABLES WHERE Variable_name RLIKE '^(character_set|collation)_' ;
+--------------------------+--------------------+
| Variable_name            | Value              |
+--------------------------+--------------------+
| character_set_client     | utf8mb4            |
| character_set_connection | utf8mb4            |
| character_set_database   | utf8mb4            |
| character_set_filesystem | binary             |
| character_set_results    | utf8mb4            |
| character_set_server     | utf8mb4            |
| character_set_system     | utf8               |
| collation_connection     | utf8mb4_unicode_ci |
| collation_database       | utf8mb4_general_ci |
| collation_server         | utf8mb4_unicode_ci |
+--------------------------+--------------------+

CREATE TABLE `umlauttest` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `umlautvalue` varchar(100) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8mb4;

SELECT * from umlauttest;
+----+-------------+
| id | umlautvalue |
+----+-------------+
|  1 | max         |
|  2 | maximilian  |
|  3 | maximum     |
|  4 | mäx         |
|  5 | abcmäxabc   |
+----+-------------+

Error

the first three values in the list containing no Umlaut (here the character ä) but the last two do. Therefore the count for the following queries should be 3 for the first (without umlaut), and 2 for the second (with umlaut). Unfortunatelly both queries result in a value of 5, which tells me the Umlauts are somehow ignored/not recognized

SELECT COUNT(id) FROM `umlauttest` WHERE `umlautvalue` LIKE '%max%';
SELECT COUNT(id) FROM `umlauttest` WHERE `umlautvalue` LIKE '%mäx%';

Do you have an idea on how this strange behavour could be fixed. Im thankful for any hints pointing me into the right direction.

Best -endo

Edit

To set the collation did the trick:

SELECT COUNT(id) FROM `umlauttest` 
WHERE `umlautvalue` LIKE '%mäx%' collate utf8mb4_bin;

thx @Shadow for poining to https://stackoverflow.com/a/47645231/5389997

endo.anaconda
  • 2,449
  • 4
  • 29
  • 55
  • Provide INSERT INTO with the data, not SELECT output. – Akina Sep 06 '21 at 16:12
  • *Do you have an idea on how this strange behavour could be fixed.* The data in the table is stored using `collation_database` whereas the literal in the query uses `collation_database`. The collation for one of the values is adjusted, and the collation which is used for comparing produces the problem in question. Specify correct collation in the query explicitly. – Akina Sep 06 '21 at 16:15
  • 1
    See the following answer to the duplicate question: https://stackoverflow.com/a/47645231/5389997 – Shadow Sep 06 '21 at 16:59

0 Answers0