0

I am writing a web service for a multilingual search using Yii2 Framework and Sphinx Search server. The search works totally fine in the website, but throws an error when hitting this web-service;

{
  "name": "PHP Warning",
  "message": "json_encode(): Invalid UTF-8 sequence in argument",
  "code": 2,
  "type": "yii\\\\base\\\\ErrorException",
  "file": "C:\\\\xampp\\\\htdocs\\\\my-website\\\\vendor\\\\yiisoft\\\\yii2\\\\helpers\\\\BaseJson.php",
  "line": 38,
  "stack-trace": [
    "#0 [internal function]: yii\\\\base\\\\ErrorHandler->handleError(2, 'json_encode(): ...', 'C:\\\\xampp\\\\htdocs...', 38, Array)",
    "#1 C:\\\\xampp\\\\htdocs\\\\my-website\\\\vendor\\\\yiisoft\\\\yii2\\\\helpers\\\\BaseJson.php(38): json_encode(Array, 320)",
    "#2 C:\\\\xampp\\\\htdocs\\\\my-website\\\\vendor\\\\yiisoft\\\\yii2\\\\web\\\\JsonResponseFormatter.php(53): yii\\\\helpers\\\\BaseJson::encode(Array)"
  ]
}

I have seeded lots of dummy real text into the db using the fzaninotto/Faker package, in English and French languages. The sphinx search works as "Search by Title" and "Search by Category ID". What I found that this error shows up when a search is made using some particular category IDs having a french translation. So when Sphinx indexes categories, it changes it collation (UTF-8), when retrieved back from sphinx the error occurs. And dumping the data without the json_encode() gives some what garbage category text that does not exists in the database table,

Here is my sphinx config

index tender
{
    source          = post
    path            = C:/xampp/htdocs/sphinx/data/posts
    min_infix_len   = 3
    enable_star     = true
    charset_type    = utf-8
    charset_table   = 0..9, A..Z->a..z, _, a..z, \
                    U+410..U+42F->U+430..U+44F, U+430..U+44F, \
                    U+C5->U+E5, U+E5, U+C4->U+E4, U+E4, U+D6->U+F6, U+F6, U+16B, U+0c1->a, U+0c4->a, \
                    U+0c9->e, U+0cd->i, U+0d3->o, U+0d4->o, U+0da->u, U+0dd->y, U+0e1->a, U+0e4->a, \
                    U+0e9->e, U+0ed->i, U+0f3->o, U+0f4->o, U+0fa->u, U+0fd->y, U+104->U+105, U+105, \
                    U+106->U+107, U+10c->c, U+10d->c, U+10e->d, U+10f->d, U+116->U+117, U+117, \
                    U+118->U+119, U+11a->e, U+11b->e, U+12E->U+12F, U+12F, U+139->l, U+13a->l, \
                    U+13d->l, U+13e->l, U+141->U+142, U+142, U+143->U+144, U+144, U+147->n, \
                    U+148->n, U+154->r, U+155->r, U+158->r, U+159->r, U+15A->U+15B, U+15B, U+160->s, \
                    U+160->U+161, U+161->s, U+164->t, U+165->t, U+16A->U+16B, U+16B, U+16e->u, \
                    U+16f->u, U+172->U+173, U+173, U+179->U+17A, U+17A, U+17B->U+17C, U+17C, U+17d->z, U+17e->z
}

and inside the source config the category attributes are like this

source post    
{
    # above is the query, config and other attributes
    sql_field_string    = cat_name_attr      # English category
    sql_field_string    = cat_name_attr_fr   # French category
}

and the sphinx config in Yii2

    'sphinx' => [
        'class' => 'yii\sphinx\Connection',
        'dsn' => 'mysql:host=127.0.0.1;port=9306;',
        'username' => 'root',
        'password' => '',
        'charset' => 'utf8',
    ],

I tried to configure everything for UTF-8 collation and even called

json_encode($record, JSON_UNESCAPED_UNICODE) // $records stores the data

but they didn't work :'(

what is confirmed that when a translated category is indexed into sphinx storage, it's charset is change which is unreadable by JSON. So for example "Electrics and Electronics" category french translation stored in MySQL table is as

"Electricité et électroniques"

but when indexed into Sphinx, it changes to this,

"Electricit� et �lectroniques"

and that text is not encoded by json_encode.

1 Answers1

0

The black diamond (�) is the browser's way of saying wtf. It comes from having latin1 characters, but telling the browser to display utf8 characters.

You could tell the browser to display <meta ... charset=ISO-8859-1>.

Sometimes this occurs together with Question Marks, in which case you must start over.

Rick James
  • 135,179
  • 13
  • 127
  • 222