8

According to php.net, Stack Overflow and other sources of trust, I can find 4 different ways to set UTF-8 on PDO connection, but can't find which one is the better to choose:

$pdo_db = 'mysql:host=localhost;dbname=local_db;charset=utf8'; // METHOD #1
$pdo_login = 'root';
$pdo_pass = 'localpass';

$db = new PDO($pdo_db, $pdo_login, $pdo_pass, array(
    PDO::ATTR_ERRMODE => $localhost ? PDO::ERRMODE_EXCEPTION : PDO::ERRMODE_SILENT,
    PDO::MYSQL_ATTR_INIT_COMMAND => 'SET NAMES utf8', // METHOD #2
));
$db->exec('SET NAMES utf8'); // METHOD #3
$db->exec('SET CHARACTER SET utf8'); // METHOD #4

So, what I understood, is that method 1 only works with PHP 5.3+ (but it seems that it's a bit buggy), and method 2 is for MySQL only. Differences between method 3 and 4 is MySQL thing, but I still don't know which one is better. And is there a way to call SET NAMES in PDO attributes, but not for MySQL only?

Joan
  • 659
  • 2
  • 7
  • 20

2 Answers2

6

Setting it in DSN is the only proper way (although it is only supported since 5.3).
You can this one and SET NAMES at the same time.

All the other ways will make infamous half-fictional GBK injection possible.

Please note that your setting for error_reporting() is utterly wrong. it have to be unconditional -1. If you concerned about displaying errors - there is a proper ini setting for this, called display_errors, can be set at runtime.
While error_reporting sets level of the error and should be at max all the time.

Your Common Sense
  • 156,878
  • 40
  • 214
  • 345
  • So for < 5.3, SET NAMES COLLATE and cie, but 5.3.0 and prior, method #1 only is really enough? AND, can I let both? Oh and I don't get your point: why it have to be unconditional -1? I don't want errors to display when I'm nt localhost… (Thanks for this off-topic advice btw!) – Joan Dec 26 '12 at 15:53
  • It sounds totally right! So 2nd line become `error_reporting(-1); ini_set('display_errors', $localhost);`. – Joan Dec 26 '12 at 16:15
  • Can you explain the second sentence please? I don't quite understand it. – Dharman Aug 12 '20 at 14:56
  • 1
    @Dharman there is an old story about [SQL injecton goes around escaping](http://origin.shiflett.org/blog/2006/addslashes-versus-mysql-real-escape-string). Half-fictional because nobody ever seen it in the wild, especially because utf-family encodings are supposedly immune. So in order to avoid even injection in some exotc incodings, the charset must be set not only for the connection bot for the PHP driver as well. so mysql_set_charset and setting in the DSN do. – Your Common Sense Aug 12 '20 at 15:02
  • 1
    @Dharman oh and yes, in case of disabled emulation it doesn't matter at all – Your Common Sense Aug 12 '20 at 15:09
  • Yeah, without emulation I know it makes no difference. I am trying to understand how it was done prior to PHP 5.3. E.g. this post suggested otherwise https://stackoverflow.com/a/4361485/1839439 – Dharman Aug 12 '20 at 15:14
  • 1
    @Dharman prior 5.3 with emulation on it was outright prone to SQL injection if you had to use GBK or some other encoding of similar renown – Your Common Sense Aug 12 '20 at 15:17
-3

I always in my dbconfig file write these code:

mysql_query("SET character_set_results = 'utf8',
                 character_set_client = 'utf8', 
                 character_set_connection = 'utf8',
                 character_set_database = 'utf8',  
                 character_set_server = 'utf8'");
Ehsan
  • 2,273
  • 8
  • 36
  • 70