I setup an InnoDB in mysql 5.7.26. The characterset is utf8mb4, but now i found out some wrong behaviour when filtering by umlauts.
Setup:
mysql> SHOW VARIABLES WHERE Variable_name RLIKE '^(character_set|collation)_' ;
+--------------------------+--------------------+
| Variable_name | Value |
+--------------------------+--------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| collation_connection | utf8mb4_unicode_ci |
| collation_database | utf8mb4_general_ci |
| collation_server | utf8mb4_unicode_ci |
+--------------------------+--------------------+
CREATE TABLE `umlauttest` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`umlautvalue` varchar(100) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8mb4;
SELECT * from umlauttest;
+----+-------------+
| id | umlautvalue |
+----+-------------+
| 1 | max |
| 2 | maximilian |
| 3 | maximum |
| 4 | mäx |
| 5 | abcmäxabc |
+----+-------------+
Error
the first three values in the list containing no Umlaut (here the character ä
) but the last two do. Therefore the count for the following queries should be 3 for the first (without umlaut), and 2 for the second (with umlaut).
Unfortunatelly both queries result in a value of 5, which tells me the Umlauts are somehow ignored/not recognized
SELECT COUNT(id) FROM `umlauttest` WHERE `umlautvalue` LIKE '%max%';
SELECT COUNT(id) FROM `umlauttest` WHERE `umlautvalue` LIKE '%mäx%';
Do you have an idea on how this strange behavour could be fixed. Im thankful for any hints pointing me into the right direction.
Best -endo
Edit
To set the collation did the trick:
SELECT COUNT(id) FROM `umlauttest`
WHERE `umlautvalue` LIKE '%mäx%' collate utf8mb4_bin;
thx @Shadow for poining to https://stackoverflow.com/a/47645231/5389997