1

I have a table with the following columns:

users( id SERIAL, username VARCHAR(20), password VARCHAR(64), salt VARCHAR(32), name VARCHAR(50), joined TIMESTAMP WITHOUT TIME ZONE, grupo INTEGER )

The database encoding is UTF8.

Pdo connection:

private function __construct(){
        try{
            $this->_pdo = new PDO('pgsql:host=' . Config::get('pgsql/host') . ';port=' . Config::get('pgsql/port') . ';dbname=' . Config::get('pgsql/db') . ';user=' . Config::get('pgsql/username'). ';password=' . Config::get('pgsql/password'));
            $this->_pdo->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
        }catch(PDOException $e){
            die($e->getMessage());
        }
    }

I'm using PDO, and my script is the following, to insert data:

public function query($sql, $params = array()){
        $this->_error = false;
        if($this->_query = $this->_pdo->prepare($sql)){
           $this->_query->execute($params);
        }
}

The sql passed is the following:

INSERT INTO users(username, password, salt, name, joined, grupo) VALUES(?, ?, ?, ?, ?, ?)

And the array passed is:

Array (             [0] => nath 
                    [1] => 81033b63c09fd9104977fdb0ef70b5dc627fd9a6e90d0d400706603def8c22a6 
                    [2] => KwjWC57AO0Gh1VvSUuJpDMNkEiraBzFL 
                    [3] => Nathália 
                    [4] => 2014-03-06 19:35:01 
                    [5] => 1 ) 

When I run this, I get the following error:

SQLSTATE[22021]: Character not in repertoire: 7 ERRO: invalid byte sequence invalid for UTF encode. "UTF8": 0xe1 0x6c 0x69

PS: If I type Nathalia instead of Nathália it works perfectly.

Trying to figure out what was going on, I inserted field by field, like this:

if($this->_query = $this->_pdo->prepare("INSERT INTO users(username) VALUES(?)"){
            $this->_query->execute(array('nath'));
}

And it worked ok. Then I replaced username with password and array('nath') with array('81033b63c09fd9104977fdb0ef70b5dc627fd9a6e90d0d400706603def8c22a6') and the same for the other fields.

Everything worked perfectly when I inserted field by field. Any clues of what is going on?

ahmm95
  • 95
  • 1
  • 9
  • What does your `PDO` constructor look like? Did you add the `charset` param or run `SET NAMES` after connection (and before anyone says this is Postgres and not MySQL, they support the same connection charset parameters)? See this answer - http://stackoverflow.com/a/18250296/283366 – Phil Mar 06 '14 at 23:56
  • @Phil I'll add up there – ahmm95 Mar 06 '14 at 23:58
  • possible duplicate of [How set UTF-8 in PDO class constructor for PHP PgSQL database](http://stackoverflow.com/questions/18250167/how-set-utf-8-in-pdo-class-constructor-for-php-pgsql-database) – Phil Mar 07 '14 at 00:00
  • @Phil I tried to set UTF-8 in PDO connection `$this->_pdo = new PDO('pgsql:host=' . Config::get('pgsql/host') . ';port=' . Config::get('pgsql/port') . ';dbname=' . Config::get('pgsql/db') . ';charset=UTF-8' . ';user=' . Config::get('pgsql/username'). ';password=' . Config::get('pgsql/password'));` and I got this: `SQLSTATE[08006] [7] invalid connection option "charset"` – ahmm95 Mar 07 '14 at 00:11
  • 1
    Then try running a `SET NAMES` query immediately after connection. Also, I believe it's `UTF8` (no hyphen) – Phil Mar 07 '14 at 00:13
  • Like this `$this->_pdo->query("SET NAMES UTF8")` ? – ahmm95 Mar 07 '14 at 00:14
  • I got the same UTF8 error :/ – ahmm95 Mar 07 '14 at 00:15
  • And I think it might be another problem, instead of database encoding, because if that was true, despite of I insert field by field, it wouldn't work. And the fact is that if I insert field by field it works. – ahmm95 Mar 07 '14 at 00:29

1 Answers1

2

The error message is specific about the problem:

0xe1 0x6c 0x69

0xe1 is á in iso-8859-1, not in utf-8.

The other two bytes represent characters in the US-ASCII range (l and i) so they share the same byte-representation in iso-8859-1 and utf-8.

Your script is sending iso-8859-1 encoded text instead of utf-8 encoded text. You should question where you get the Nathália string from, and how it's supposed to be encoded.

If it's supposed to be in utf-8 then it's a bug from the producer of that string. If it's supposed to be in ISO-latin, then your script must apply utf8_encode to it before feeding it to the utf-8 database connection.

Daniel Vérité
  • 58,074
  • 15
  • 129
  • 156