4

I want to encode Danish characters before sending them to database.

I try to apply this function to them:

private function process_array_elements(&$element){
   $element = utf8_encode($element);
   $element = strip_tags(  $element );
   $element = strtolower ( trim ( $element )  );
   $element = mysql_real_escape_string($element);
   return $element;
}

Like this:

$this->check_line_two = $this->process_array_elements($e);

Now, whenever I try to send the string to the database:

mysql_query("SET NAMES 'utf8'");
$query="INSERT INTO  result_scanned
          SET line_one= '$this->check_line_one',
              line_two='$this->check_line_two',
              line_three='$this->check_line_three',
              advert_id='$this->advert_id',
              scanned='$this->scan_result'";

I get this:

 Incorrect string value: '\xE3\x83\xE2\xB8r ...' for column 'line_three' at row 1

The datatype of the fields in the table are UTF-8 (utf8_unicode_ci), so I must encode my string using the same

This thread is related to my question: Detect encoding and make everything UTF-8.

However, I need to know how to encode any character to UTF-8, before inserting it to the database, otherwise, I get an error as the one aforementioned. I think I need to identify first what kind of characters I am receiving before putting them into database.

Community
  • 1
  • 1
Dmitry Makovetskiyd
  • 6,942
  • 32
  • 100
  • 160
  • You do `$this->check_line_two`, but the SQL chokes on `for column 'line_three'` – Eugen Rieck Jun 11 '12 at 12:09
  • Stop using the `mysql_` methods. They're old and it's easy to screw something up. Instead, use [PDO](http://php.net/pdo). Also, your function `process_array_elements` (which is a misguiding name for a method, since it has nothing to do with an array, and it can process any kind of string - not something only coming from an array), is taking `$elements` as a reference, but still returning the result. This is confusing and might cause unexpected side effects. – kba Jun 11 '12 at 12:13
  • thanks, kristin,, althought it doesnt relate to the question – Dmitry Makovetskiyd Jun 11 '12 at 12:15
  • I'm aware, that's why I added it as a comment, not an answer. Still something you should look into, in my opinion. ;-) – kba Jun 11 '12 at 12:15
  • Do not use strtolower, use mb_strtolower (see my answer below) – Prof Dec 19 '12 at 08:18

2 Answers2

5

utf8_unicode_ci is the collation, you need the character set as utf8 as well:

CREATE TABLE someTable DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_unicode_ci;

For good measure, make sure that everything is UTF-8 when connecting to MySQL with PHP:

mysql_connect ("localhost", "DB_USER", "DB_PASSWORD") or die (mysql_error());
mysql_select_db ("DATABASE_NAME") or die (mysql_error());

mysql_query("SET character_set_client=utf8"); 
mysql_query("SET character_set_connection=utf8"); 
mysql_query("SET character_set_database=utf8"); 
mysql_query("SET character_set_results=utf8"); 
mysql_query("SET character_set_server=utf8"); 
mysql_query("SET NAMES 'utf8'");

This is how I connect on the Hebrew gibberish translation website I run, and it handles everything that myself or that users throw at it.

dotancohen
  • 30,064
  • 36
  • 138
  • 197
  • it helps.. but there is also a probelm , that i work with different characters..I think i need to encode some of them before putting them into the database... otherwise i get an error – Dmitry Makovetskiyd Jun 11 '12 at 12:14
  • Just make sure that the HTML page is also UTF-8, that the MySQL table has the `DEFAULT CHARACTER SET` set properly, and connect to the database as I've shown. Don't try to encode the characters in PHP to save in the database (but please do sanitise them!). Only reencode the characters if you plan on encoding them for an email or to write to a file. Even in those cases, you should consider leaving it as UTF-8 though. – dotancohen Jun 11 '12 at 12:30
  • I get those errors thrown, if i dont encode the characters before puttin gthem into the database – Dmitry Makovetskiyd Jun 11 '12 at 12:40
  • Please post the output of `SHOW CREATE TABLE tableName;` to pastebin and link here. Thanks. – dotancohen Jun 11 '12 at 12:54
2
Incorrect string value: '\xE3\x83\xE2\xB8r ...' for column

I only get this error when i use strtolower

My database and collation are all UTF-8 and have no problems with characters across the board. I have a language database table that handles Japanese, Russian, Chinese etc. without any issues. This error only popped up when i tried strtolower

You need to use PHP's mb_strtolower when working with uni-code characters

Prof
  • 2,898
  • 1
  • 21
  • 38