0

I've been having a lot of encoding troubles with PHP/Mongo in general.

Right now, I'm in the process of converting some data from MySQL to Mongo. I have a string that contains a é, but when I try to encode it to UFT-8 (via mb_convert_encoding, uft8_encode), it turns into é. I'm sure other strings also contain other accented characters.

I've tried mb_detect_encoding, which told me the string is UTF-8, but when I do mb_check_encoding($string, 'UTF-8'), it returns false.

Basically, I have no idea what's wrong. This is on a page that is just a PHP script, no HTML. Any advice to this problem, or in general maintaining character encoding when inserting into Mongo?

Here is the script in question: https://plnkr.co/edit/eAkLxfklzLNCsZTBPKsX

The MySQL table is using a MyISAM engine, charset utf8, collation utf8_unicode_ci

Rohit
  • 3,018
  • 2
  • 29
  • 58
  • You're saying it's 'on a page that is just a PHP script' however it is still consumed by the browser? Have you checked your HTML content descriptors? – Mr Rho Jan 23 '16 at 21:22
  • Sorry, I'm not totally sure what you mean. I did set `header('Content-type: text/plain; charset=utf-8');` at the top. – Rohit Jan 23 '16 at 21:27
  • 1
    Ok, can you show us your PHP script? The other issue might be the way Mongo is set up vs MySQL. – Mr Rho Jan 23 '16 at 21:35
  • Added the script and MySQL table settings – Rohit Jan 23 '16 at 21:58
  • So the `é` is proper UTF encoding as read by a normal text reader - any extended ASCII characters are converted to two bytes instead of the one byte `é`. I don't think the export from MySQL is incorrect - it contains the correct UTF-8 values. It must be your MongoDB import. I don't actually see where you initialise your mongo database in the PHP script - perhaps you need to explicitly specify your encoding parameters there? – Mr Rho Jan 23 '16 at 22:10
  • When someone enter's an accented character in a form (on pages, I have `` set), it shows up in mongo as an accented character. As mentioned, I've been having trouble with encoding. When I store the `é` into Mongo, and then try to display with PHP (using something like `utf8_decode`), it garbles entirely. Not sure what to do anymore. – Rohit Jan 23 '16 at 22:15
  • Perhaps your only mistake is the text/plain - according to this post: http://stackoverflow.com/questions/3682409/reading-utf-8-content-from-mysql-table it should be text/html? Otherwise I'm stumped, sorry. (Awesome name btw ;D) – Mr Rho Jan 23 '16 at 22:21
  • What do you mean you have a string that contains "é"? Where is this string? Is it in your php file? Is it in a form on a html page that you post? Anyway, give us some details. In what encoding it is? Why not utf8? (you say you want to convert it to utf8, so I suppose it's not in utf8 :) – Gavriel Jan 23 '16 at 22:36
  • Mr Rho: I've tried as text/plain, text/html, and without that entirely, no luck. Gavriel, as stated in the question, I'm trying to convert MySQL to Mongo. The string is from MySQL. And as the question states, `mb_detect_encoding` says its UFT-8 but `mb_check_encoding` says it's not UTF-8. – Rohit Jan 23 '16 at 22:40

1 Answers1

0
  • Do not use the mysql_* API; change to mysqli_*

  • Do not use any mb or utf8 encode/decode routines; they merely hide the 'proper' solution.

  • Right after connecting to mysql, do SET NAMES utf8.

  • SHOW CREATE TABLE -- verify that the table/columns are CHARACTER SET utf8 (or utf8mb4)

é is the Mojibake for é. It usually indicates a mismatch of latin1 settings and utf8 settings.

If using PDO: $db = new PDO('dblib:host=host;dbname=db;charset=UTF8', $user, $pwd); or execute SET NAMES utf8.

Rick James
  • 135,179
  • 13
  • 127
  • 222