4

I am using PHP 5.5 and when I attempt to insert a UTF-8 character in the MySQL database PDO cuts it off at the first non-ASCII character.

I have set my connection to be:

(DB_TYPE.':host='.DB_HOST.';dbname='.DB_NAME.';charset=utf8', DB_USER, DB_PASS, array(PDO::ATTR_DEFAULT_FETCH_MODE => PDO::FETCH_ASSOC, PDO::ATTR_ERRMODE => PDO::ERRMODE_WARNING))

I have tried the SET NAMES that everyone posts, but that doesn’t work either because the problem is NOT on the MySQL side of things.

When I do an insert through phpMyAdmin and directly from the MySQL console, it works! When I select the accented string with PDO, it works!

The problem is only on INSERT and UPDATE using PDO specifically!

Here is the SQL of the table. It is all in UTF-8 but maybe someone knows of a conflict between a setting and PDO

CREATE TABLE IF NOT EXISTS `mytable` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `id_lang` int(11) NOT NULL DEFAULT '2',
  `id_tgroup_cat` int(11) NOT NULL,
  `fieldfor` int(11) NOT NULL,
  `colors` varchar(100) NOT NULL,
  `text` text NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 AUTO_INCREMENT=34 ;

I have already tried to make text a varchar field and that did not change anything.

The actual insert in PHP:

    $query = $this->db->prepare("UPDATE mytable
                                    SET text = ?,
                                        colors = ?
                                    WHERE id = ?");
    $query->execute(array($text, $colors, $id));

Where $text = "référence" (only saves the letter R in the database but without accents it saves everything) and $colors is an empty string for test purposes and $id is 2.

Giacomo1968
  • 25,759
  • 11
  • 71
  • 103
Jack M.
  • 1,195
  • 13
  • 30

1 Answers1

3

This is the key clue to me:

Where $text = "référence" (only saves the letter R in the database but without accents it saves everything) and $colors is an empty string for test purposes and $id is 2.

Sounds like it is a UTF-8 encoding issue. While the database is UTF-8 the whole chain from the code to the database—including the connection—should be UTF-8 clean.

How exactly does $this->db->prepare relate to the PHP connection to MySQL? A bit unclear from the code you have shown. But based on what you are showing, perhaps adjusting your query like this would help:

$query = $this->db->prepare("SET collation_connection = utf8_bin;
                             SET NAMES utf8;
                             UPDATE mytable
                                SET text = ?,
                                    colors = ?
                                WHERE id = ?");

Or maybe this:

$this->db->exec("SET collation_connection = utf8_bin; SET NAMES utf8;");
$query = $this->db->prepare("UPDATE mytable
                                SET text = ?,
                                    colors = ?
                                WHERE id = ?");

Note my forced-in addition of SET collation_connection = utf8_bin; as well as SET NAMES utf8;

In general you need to make sure your entire chain from the connection, to the database, to the tables is all UTF8 clean. I have a detailed answer to a similar question here.

But in your case, check the actual MySQL server my.cnf file. The following would set the whole chain to UTF-8:

[client]
default-character-set=utf8

[mysql]
default-character-set=utf8

[mysqld]
collation-server = utf8_unicode_ci
init-connect='SET NAMES utf8'
character-set-server = utf8

EDIT: And since the original poster indicates the data is coming from an HTML5 form, I also think checking the BOM (byte order mark) for the actual HTML5 file itself would help as well. It should be set to UTF8. More details on what a BOM is are over here. Specifically the accepted answer from Martin Code which explains:

The UTF-8 BOM is a sequence of bytes (EF BB BF) that allows the reader to identify the file as an UTF-8 file.

Community
  • 1
  • 1
Giacomo1968
  • 25,759
  • 11
  • 71
  • 103
  • As stated in the OP, that doesn't work (and there are better ways to do this through PDO) - it obviously is a utf8 mismatch - I just need to figure out where and why :) – Jack M. May 15 '14 at 17:24
  • 1
    @MrJack Please check my latest edit. If that doesn’t help, fair enough. But one thing missing from your post is a clear picture of where these strings get set first. Do they come from a form? Is the file saved with the BOM to UTF8? More details on that side would help. – Giacomo1968 May 15 '14 at 17:28
  • Hi, yes they come from an HTML5 form with charset specified as the first meta tag. - gonna go check the config files and your other answer now. – Jack M. May 15 '14 at 17:31
  • @MrJack Yes, but please check the HTML5 form itself and make sure the file itself is saved with a UTF8 BOM. – Giacomo1968 May 15 '14 at 17:32
  • Maybe there is something there, it is in ascii, not utf-8. – Jack M. May 15 '14 at 18:10
  • After looking for BOM, there is none. – Jack M. May 15 '14 at 18:52
  • 1
    Ok got it to work by deleting and recreating the file as utf8 – Jack M. May 15 '14 at 19:05
  • @MrJack Good to hear things are fixed! Unclear what system you are on or what editor you were using, but BOM settings are controllable in most developer editing tools. So if you recreated the file it means the new one was made with a proper UTF8 BOM. Plain ASCII is a different thing. – Giacomo1968 May 15 '14 at 19:09
  • 1
    I am on Ubuntu 12.04. It still has no BOM, but now it saves so its all that matters. I was working from a copied file though, so perhaps there was something set wrong with it to start with. – Jack M. May 15 '14 at 19:43
  • 2
    @MrJack “I was working from a copied file though, so perhaps there was something set wrong with it to start with.” Bingo! If you got the file from a Windows machine directly or from an archive that would definitely cause that issue. – Giacomo1968 May 15 '14 at 19:44