I am facing some weird collation issue which I really don't understand. I have table containing polish accented letters like "Ł" in two or three columns. Due to occurrence of this letters, utf8_unicode_520_ci collation is most suitable for my purpose. I am facing this issue in my PHP page. Webserver is running Apache 2.4, PHP 7.1.0, MySQL 5.7.
After connecting to database, I am setting charset and collation this way:
$connection->set_charset("utf8");
$connection->query("SET NAMES utf8 COLLATE utf8_unicode_520_ci;");
Afterwards I am double-checking actual charsets and collation using SHOW VARIABLES LIKE '%collat%';
and SHOW VARIABLES LIKE '%char%';
which in formatted form return following information:
collation_connection->utf8_unicode_520_ci
collation_database->utf8_unicode_520_ci
collation_server->binary
character_set_client->utf8
character_set_connection->utf8
character_set_database->utf8
character_set_filesystem->binary
character_set_results->utf8
character_set_server->binary
character_set_system->utf8
character_sets_dir->/usr/share/mysql/charsets/
Until this point everything seems to be corrrect. But i am facing ordering issue when I am running query below:
SELECT birthPlace FROM persons ORDER BY birthPlace DESC LIMIT 5;
which retrun following results:
Łapsze Wyżne
Łapsze Wyżne
Zvolenská Slatina
Župčany
Župčany
Entry "Łapsze Wyżne" should not be listed as last as per utf8_unicode_520_ci collation it should be listed along with entries starting with normal "L" character. So it is evident that it is picking other collation then utf8_unicode_520_ci.I think is utf8_general_ci collation what is used (guessing from results order). Collation of table and column "birthPlace" is utf8_unicode_520_ci. I've expected collation precedence will be default column, table, database respectively.
phpMyAdmin and MySQL Workbench sort entries as expected. But my script is not sorting as expected.
So my question is what collation is really used if it is not explicitly set in query and how to solve it without modifying query if possible.
EDIT: Question itself is not about choosing proper collation as I already know and defined collation which is correct one for my purpose on all places I am able to control. Question is what collation is used if it is not explixitly defined in SELECT query. It is collation of connection, column, table, database, or server? Or it is default collation of used charset (in my case it is utf8_general_ci for uth8 charset? This is not clear to me from documentation. I want to order results by utf8_unicode_520_ci by default without defining it in every query I'll run in future. It is possible?