9

We can explicitly set the char set to utf8 when initializing PDO, just add "charset=utf8" to the dsn string. But how does one explicitly specify the collation used in MySQL connection when using PDO?

I don't want to use an additional query to do this:

SET NAMES utf8 COLLATE utf8_unicode_ci;

Is there any way without having to resort to "SET NAMES"? Or, would there be any problem if I don't specify a collation?

datasn.io
  • 12,564
  • 28
  • 113
  • 154
  • @kavoir.com I think I have a shorter answer to your question. It is a year late, but hey, I just swapped over from MySQLi prepared statements. – Anthony Rutledge Sep 13 '15 at 19:07

3 Answers3

10

Here is a two in one answer.

You can set this in the DSN or as MYSQL_ATTR_INIT_COMMAND (connection options).

DSN is better, i think.

$connect = new PDO(
  "mysql:host=$host;dbname=$db;charset=utf8", 
  $user, 
  $pass, 
  array(
    PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8"
  )
); 

If you specify UTF-8 you are working with the default collation of utf8_general_ci, unless your db table or field uses something different.

If you want the whole server to respond with this default collation then use configuration directives:

collation_server=utf8_unicode_ci 
character_set_server=utf8

So you don't have to specify it on connection everytime.

The collations affect the sorting of chars and is set on the table and fields in your database. These settings are respected, when querying the table. Make sure they are set. Use UTF-8 names with the collation set in your db.


Your comment:

"People should know char set and collation are 2 different things."

Let's Quote from the MySQL Manual to proof this:

A SET NAMES 'charset_name' statement is equivalent to these three statements:

SET character_set_client = charset_name;
SET character_set_results = charset_name;
SET character_set_connection = charset_name;

Setting character_set_connection to charset_name also implicitly sets collation_connection to the default collation for charset_name.

My answer: It works implicitly, unless your tables changes this explicitly.


Question from comment:

How to make sure I don't mess things up as my tables are not the default collation utf8_general_ci?

Example: Column collation overrides table collation

CREATE TABLE t1
(
    col1 CHAR(10) CHARACTER SET utf8 COLLATE utf8_unicode_ci
) CHARACTER SET latin1 COLLATE latin1_bin;

If both CHARACTER SET X and COLLATE Y are specified on a column, character set X and collation Y are used. The column has character set utf8 and collation utf8_unicode_ci as specified in the table column, while the table is in latin1 + latin1_bin.

Example: in general table collation is used

If collation is not explicitly specified on a column/Field, then the table collation is used:

CREATE TABLE t1
(
    col1 CHAR(10)
) CHARACTER SET latin1 COLLATE latin1_bin;

col1 has collation latin1_bin.

If you want utf8_unicode_ci collation, set it to your tables in general or to the columns/fields.

Jens A. Koch
  • 39,862
  • 13
  • 113
  • 141
  • People should know char set and collation are 2 different things. – datasn.io Sep 12 '14 at 11:35
  • My tables are all in utf8_unicode_ci. I don't have access to configuration directives. Will they be automatically respected? – datasn.io Sep 12 '14 at 12:02
  • i've updated my answer to show that these settings are implicitly respected. **SET NAMES UTF-8 should be enough to implicitly set the collation utf8_unicode_ci**. Hope this works out and maybe this explains, why nobody asks for collation, when working with set names. by the way, it's still possible to do a query with another collation, which would affect the sorting then.. – Jens A. Koch Sep 12 '14 at 12:05
  • ok, updated my answer with two examples. it depends, on what you specify on your tables / columns. – Jens A. Koch Sep 12 '14 at 12:25
  • 1
    @JensA.Koch Why is this accepted as the answer, when clearly only explicitly stating the collation in syntax can satisfy the question? – Anthony Rutledge Sep 13 '15 at 19:06
  • I think the answer was accepted, because i answered the initial question and all additional questions from the comments, which helped the questioner a bit. / One part of the question is `Is there any way without having to resort to "SET NAMES"? Or, would there be any problem if I don't specify a collation?` and i provided a detailed answer with examples, when to leave the things away. So, one part of the answer is: You don't have to (explicitly) specify the COLLATION when using SET NAMES, if your DB is configured correctly. – Jens A. Koch Sep 13 '15 at 21:17
8

Question: "How to specify collation with PDO without SET NAMES? .. how does one explicitly specify the collation used in MySQL connection when using PDO?"

Answer: You just cannot do it without using SET NAMES or something similar. Using PDO::MYSQL_ATTR_INIT_COMMAND in the $options array of the PDO constuctor is the only way to explicitly set the connection collation directly in your connection code using PDO. Otherwise, you will be relying on something less than explicit syntax (which is not the answer to the question). Certainly, any other method is less direct.

Some versions of MySQL (5.1) have two, 3-byte unicode, uft8 collations (unicode and general). Simply using utf8 in the $dsn string will not explicitly choose the "unicode" version or the "general" version of the utf8 collations. PDO is not a mind reader.

Therefore, your options string may look something like this:

$options  = [PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION, PDO::ATTR_PERSISTENT => true, PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES 'utf8' COLLATE 'utf8_unicode_ci'"];

or

$options  = [PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION, PDO::ATTR_PERSISTENT => true, PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES 'utf8' COLLATE 'utf8_general_ci'"];

Later versions of MySQL have a 4-byte utf8 unicode implementation. Here, you would specify utf8mb4, not uft8.

$options  = [PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION, PDO::ATTR_PERSISTENT => true, PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES 'utf8mb4' COLLATE 'utf8mb4_unicode_ci'"];
Anthony Rutledge
  • 6,980
  • 2
  • 39
  • 44
-1

Use this ---Done---- enter image description here

$pdo = new PDO('mysql:host=localhost;dbname=vocabulary;charset=utf-8', "root", "", array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8"));

and in HTML page use

<meta charset="UTF-8">
Subroto Biswas
  • 553
  • 7
  • 5