21

Just read Stefan Gehrig excellent answer to Is "SET CHARACTER SET utf8" necessary?, which goes a bit further than MySQL's documentation at explaining the stages of interpretting and running a query w.r.t. character sets and collations, but I still can't really see the purpose of character_set_connection, or more specifically transcoding the statement from character_set_client into character_set_connection.

Why not just use character_set_client for the query and transcode straight from character_set_client to the character set of the column when comparing with column values? What is the purpose of this intermediate stage? The manual gives the example of comparing literal stings, but why would you want to do this in the first place, let alone in character_set_connection as oppose to character_set_client? Unless my understanding of this (something like "select 'somestr' = 'somestr' from x") is wrong.

Thank you.

Community
  • 1
  • 1
lm713
  • 312
  • 5
  • 16
  • You might get an unsatisfactory answer. I could very well be that the MySQL network protocol did not support transferring the server used encoding and therefor the client needed to know how to interpret the characters coming over the network, and it has not been broken for compatibility reasons. I'm just guessing, this is not an answer. – 0xCAFEBABE Apr 18 '13 at 12:15
  • Thanks for your response, and maybe I do not understand, but I think character_set_results is used for the sending of results and this is chosen by the client. As far as I know from the explanations, character_set_connection is only used internally by MySQL. – lm713 Apr 18 '13 at 12:26
  • I believe this is what MySQL will use when receiving data – Charlie Walton Jun 01 '13 at 03:19

4 Answers4

8

After reading the answers and documentation, I can only think of one use case for character_set_connection (and _collation):

SELECT "StringA" < "StringB"

character_set_client only matters for the transfer to the server. character_set_connection (and the collation, which is not independent from the character set) matters for the interpretation of the statement. Whether "StringA" is less than "StringB" depends on the character set and collation of the literals. A developer might choose a character set/collation which differs from character_set_client.

In practice, character_set_connection won't matter most of the time, because literals are compared to columns, in which case the column's charset and collation is used.

Correct me if I'm wrong!

See https://dev.mysql.com/doc/refman/5.0/en/charset-connection.html:

What character set should the server translate a statement to after receiving it? For this, the server uses the character_set_connection and collation_connection system variables. It converts statements sent by the client from character_set_client to character_set_connection (except for string literals that have an introducer such as _latin1 or _utf8). collation_connection is important for comparisons of literal strings. For comparisons of strings with column values, collation_connection does not matter because columns have their own collation, which has a higher collation precedence.

Christian
  • 4,042
  • 4
  • 26
  • 28
1

The two are different in that character_set_client is assumed to be the charset the statement is sent from the client in, and therefore the charset the server uses to interpret the statement, while character_set_connection is what the server converts the statement into for processing.

character_set_connection is used, as discussed, for comparison of literal strings. This does not necessary mean that both sides of the equation must be literal strings, however. Eg:

WHERE column_name = 'literal_string'
     (charset col)  (charset connection)

If the character sets of the column and the connection are different, the comparison is illegal and will cause an error.

The results (and and response messages) are then encoded into character_set_results for sending back to the client.

Hearth
  • 383
  • 1
  • 4
  • 13
  • this answer misses the critical part of the question: why not convert straight from character_set_client? – goat Jun 24 '13 at 05:19
  • 1
    There is no implicit conversion during the comparison operation, therefore the intermediary stage is necessary in case the client and server (column) are in different charsets. – Hearth Jun 24 '13 at 05:31
  • I interpreted the question as from a developers point of view why do we have this extra option and what does it do? If you want to ask philosophically why the server behaves in the manner it does, that is probably best directed at the MySQL dev team, as it was obviously a design decision at some satge. – Hearth Jun 24 '13 at 05:40
  • From my understanding of the documentation 'literal_string' would be converted to (charset col) for the comparison. Is that not the case? Thanks for your comments everyone. I didn't think I would get an answer. – lm713 Jun 25 '13 at 17:02
  • I do not believe this is done implicitly, as not all charsets are commutable. Could you point me to the place in the documentation you are referring to? I would like to check it and see if I can confirm. – Hearth Jun 25 '13 at 23:16
  • What if the client sent unicode chars using character_set_client=utf8 but the server interprets queries using character_set_connection=latin1? For example in .NET the Encoding library replaces chars that cannot be represented with a question mark character. – Andrew Nov 30 '17 at 09:37
1

It's because MySQL allows every string literal has its own character set. That said, the character set of string literals in a statement is not always the same as the statement's.

See this page in the manual: http://dev.mysql.com/doc/refman/5.5/en/charset-literal.html

Inglis Baderson
  • 779
  • 4
  • 12
-1
> <?php

// ... (create a connection to mysql) ...

mysql_query("SET character_set_results = 'utf8', character_set_client = 'utf8', character_set_connection = 'utf8', character_set_database = 'utf8', character_set_server = 'utf8'", $conn);

$re = mysql_query('SHOW VARIABLES LIKE "%character_set%";')or die(mysql_error());
while ($r = mysql_fetch_assoc($re)) 
{
    var_dump ($r); echo "<br />";
} 

exit;

?>

All important variables are now utf-8 and we can safely use INSERTs or SELECTs with mysql_escape_string($var) without any encoding functions.

Rob
  • 26,989
  • 16
  • 82
  • 98
Mohammed Saqib Rajput
  • 1,331
  • 1
  • 14
  • 22
  • If you set the connection variables via `SET` queries, you **cannot** safely use client-side escaping, and most certainly not with `mysql_escape_string`. You need to use `mysql_set_charset()` client side and then use `mysql_real_escape_string`. Or forgo the deprecated mysql API to begin with. – deceze Nov 06 '13 at 11:53