57

I'm calling json_encode() on data that comes from a MySQL database with utf8_general_ci collation. The problem is that some rows have weird data which I can't clean. For example symbol , so once it reaches json_encode(), it fails with json_encode(): Invalid UTF-8 sequence in argument.

I've tried utf8_encode() and utf8_decode(), even with mb_check_encoding() but it keeps getting through and causing havoc.

Running PHP 5.3.10 on Mac. So the question is - how can I clean up invalid utf8 symbols, keeping the rest of data, so that json_encoding() would work?

Update. Here is a way to reproduce it:

echo json_encode(pack("H*" ,'c32e'));
AJReading
  • 1,193
  • 20
  • 35
Artjom Kurapov
  • 6,115
  • 4
  • 32
  • 42
  • Are you sure you're retrieving your data encoded in UTF-8 from the database? What data do you have, what do you expect? Show us a `bin2hex` of the problematic data. – deceze Apr 18 '12 at 08:38
  • `"\xC3\x2E"` is indeed not a valid UTF-8 string. Where is it coming from? MySQL should not output invalid UTF-8 strings if it's set to return UTF-8. – deceze Apr 18 '12 at 09:13
  • I'm having the same issue but querying from SQL Server via an ODBC connection and my special character is: ®. It seems that no one has resolved this issue. – Salsero69 Sep 24 '12 at 18:19
  • Have you tried the JSON_UNESCAPED_UNICODE flag? – Benubird Aug 05 '13 at 14:01

11 Answers11

32

I had a similar error which caused json_encode to return a null field whenever there was a hi-ascii character such as a curly apostrophe in a string, due to the wrong character set being returned in the query.

The solution was to make sure it comes as utf8 by adding:

mysql_set_charset('utf8');

after the mysql connect statement.

Robert Imhoff
  • 471
  • 4
  • 6
23

Seems like the symbol was Å, but since data consists of surnames that shouldn't be public, only first letter was shown and it was done by just $lastname[0], which is wrong for multibyte strings and caused the whole hassle. Changed it to mb_substr($lastname, 0, 1) - works like a charm.

Artjom Kurapov
  • 6,115
  • 4
  • 32
  • 42
  • I just stumbled on the same problem; turns out I had a `substr()` call in there akin your `[0]` dereference ;-) – Ja͢ck Aug 10 '12 at 01:48
  • In addition, this error can be thrown at json_encode after preg_replace by regexp without `u` modifier. – rNix Sep 15 '14 at 02:39
22

The problem is that this character is UTF8, but json_encode does not handle it correctly. To say more, there is a list of other characters (see Unicode characters list), that will trigger the same error, so stripping off this one (Å) will not correct an issue to the end.

What we have used is to convert these chars to html entities like this:

htmlentities( (string) $value, ENT_QUOTES, 'utf-8', FALSE);
serge.k
  • 370
  • 3
  • 6
13

Make sure that your connection charset to MySQL is UTF-8. It often defaults to ISO-8859-1 which means that the MySQL driver will convert the text to ISO-8859-1.

You can set the connection charset with mysql_set_charset, mysqli_set_charset or with the query SET NAMES 'utf-8'

Emil Vikström
  • 90,431
  • 16
  • 141
  • 175
10

Using this code might help. It solved my problem!

mb_convert_encoding($post["post"],'UTF-8','UTF-8');

or like that

mb_convert_encoding($string,'UTF-8','UTF-8');
Can Uludağ
  • 705
  • 8
  • 14
3

The symbol you posted is the placeholder symbol for a broken byte sequence. Basically, it's not a real symbol but an error in your string.

What is the exact byte value of the symbol? Blindly applying utf8_encode is not a good idea, it's better to find out first where the byte(s) came from and what they mean.

Evert
  • 93,428
  • 18
  • 118
  • 189
  • I doubt that I can trace back where that symbol came from – Artjom Kurapov Apr 18 '12 at 08:40
  • It may be possible that you saved it with the wrong charset. You should always make sure that you KNOW the charset of all your strings so you never even save them wrong. Now you need to find out which strings are saved in the wrong charset and find a way to convert them to the correct one, or possibly throw away the invalid strings. It may not always be possible to convert the strings since you may have lost bytes along the way if you worked with mixed encodings. – Emil Vikström Apr 18 '12 at 08:43
  • well if you can't find out why that symbol is in there, then try to post *what* symbol it is. The graphical placeholder doesn't add that much information :) – Evert Apr 18 '12 at 08:46
  • I see your answer. Glad you figured it out :) – Evert Apr 18 '12 at 11:06
0

Another thing that throws this error, when you use php's json_encode function, is when unicode characters are upper case \U and not lower case \u

rharvey
  • 1,987
  • 1
  • 28
  • 23
0

json_encode works only with UTF-8 data. You'll have to ensure that your data is in UTF-8. alternatively, you can use iconv() to convert your results to UTF-8 before feeding them to json_encode()

Deepika Patel
  • 2,581
  • 2
  • 19
  • 13
0

Updated.. I solved this issue by stating the charset on PDO connection as below:

"mysql:host=$host;dbname=$db;charset=utf8"

All data received was then in the correct charset for the rest of the code to use

Jamie Deakin
  • 156
  • 9
0
I am very late but if some one working on SLIM to make rest api and getting same error can solve this problem by adding below line as:

<?php

// DbConnect.php file
class DbConnect
{
    //Variable to store database link
    private $con;

    //Class constructor
    function __construct()
    {

    }

    //This method will connect to the database
    function connect()
    {
        //Including the constants.php file to get the database constants
        include_once dirname(__FILE__) . '/Constants.php';

        //connecting to mysql database
        $this->con = new mysqli(DB_HOST, DB_USERNAME, DB_PASSWORD, DB_NAME);

        mysqli_set_charset($this->con, "utf8"); // add this line 
        //Checking if any error occured while connecting
        if (mysqli_connect_errno()) {
            echo "Failed to connect to MySQL: " . mysqli_connect_error();
        }

        //finally returning the connection link
        return $this->con;
    }
}
inrsaurabh
  • 602
  • 1
  • 8
  • 17
-2

Using setLocale('fr_FR.UTF8') before json_encode solved the problem.

KeizerBridge
  • 2,707
  • 7
  • 24
  • 37