0

Im storing text in a DB as UTF8.

When a post is sent via JS to my API, such symbols as ö come back as "ö"

My website html is declared as

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

My API output is sent out with a header declaring utf-8, like so:

$status_header = 'HTTP/1.1 '.$status.' '.self::getStatusCodeMessage($status);
header($status_header);
header('Content-type: ' . $content_type.'; charset=utf-8');

if ($body !== '') {
    echo $body;

The only way I've managed to get round this is by using PHP on my output todo this:

private static function fixText($text) {

        $replaceChars = array(
            "“" => "\"",
            '•' => '·',
            "â€" => "\"",
            "’" => "'",
            'ö' => 'ö',

            'â€' => "'",

            "é" => "é",
            "ë" => "ë",
            "£" => "£"
        );
        foreach($replaceChars as $oldChar => $newChar) {
            $text = str_replace($oldChar, $newChar, $text);
        }

        $text = iconv("UTF-8", "UTF-8//IGNORE", $text);
        return $text;
    }

Obviously this is not ideal as I have to keep adding more and more symbols to the map.


UPDATE:

A developer had sneakily added this code:

$document->text = mb_convert_encoding($document->text, mb_detect_encoding($document->text), "cp1252");

As a way to overcome old latin characters coming through damaged.

azz0r
  • 3,283
  • 7
  • 42
  • 85
  • What are the character set and collations of your db and tables ?? – M Khalid Junaid Aug 06 '13 at 10:19
  • It's good that you showed your current workaround, but it would be better if you also gave more details about what you are doing exactly. A meta tag and an assertion that your API output is sent as utf-8 is not much to go on. – Jon Aug 06 '13 at 10:30
  • Sorry I'm using mongoDB, as far as I'm aware mongoDB is always utf-8. Updated code samples also. – azz0r Aug 06 '13 at 10:33
  • See [UTF-8 all the way through](http://stackoverflow.com/q/279170/476) and [Handling Unicode Front To Back In A Web App](http://kunststube.net/frontback/) and [What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text](http://kunststube.net/encoding/). I don't know the specific setting for working with MongoDB, but the concepts are always the same. – deceze Aug 06 '13 at 10:40

1 Answers1

1

Seeing those funny characters means that you have double-encoded UTF-8 stored. You don't show how you are adding data to the database. If you use utf8_encode() on already UTF-8 encoded strings, this will be your result.

MongoDB only accepts UTF-8 but you should not encoded it yourself again, if you're already gettings UTF-8 send through to you by the webserver.

Instead of:

header('Content-type: ' . $content_type.'; charset=utf-8');

Consider setting the default charset in php.ini:

default_charset=UTF-8
Derick
  • 35,169
  • 5
  • 76
  • 99