2

I'm having problems.

I have a REST API that uses json_encode to output data as JSON. However, sometimes data gets pushed to the API that is not UTF-8 data. And so when trying to output this data, json_encode throws an exception, because it can only handle UTF-8 data.

What should I do? Can I somehow force every incoming data to be UTF8? This seems to be hard, because I have no information what encoding the data is sent in.

Or should I try to run json_encode on the incoming data and if it can not encode return an error?

EDIT: I forgot to mention that this is a REST API. So I get POST requests to my API with lots of fields and values.

Sebastian Hoitz
  • 9,343
  • 13
  • 61
  • 77
  • Where does the data come from? – Emyr May 04 '11 at 13:54
  • Is the incoming data always either UTF-8 or ISO-8859-1, or are other encodings also in the mix? How international is this? – Pekka May 04 '11 at 13:57
  • All different. Some comes from a Twitter StreamingApi Client, some from E-Mail. – Sebastian Hoitz May 04 '11 at 14:03
  • possible duplicate of [PHP: Convert any string to UTF-8 without knowing the original character set, or at least try](http://stackoverflow.com/questions/7979567/php-convert-any-string-to-utf-8-without-knowing-the-original-character-set-or) – fkoessler Sep 09 '15 at 09:53

4 Answers4

2

You might be able to use mb_detect_encoding() to guess at what character encoding you're getting, but the heuristics involved in guessing a character encoding are less than 100% reliable so it might still not work, and worse you might mangle a string that was valid.

If the JSON source is sending a content-type header, it should also include the (intended) character encoding.

   Content-Type: application/json; charset=ISO-8859-4

If this information is accurate then you could use it to do the transcoding.

GordonM
  • 31,179
  • 15
  • 87
  • 129
  • +1 for getting the encoding information from the sender would be the safest practice to rely on – breiti Nov 13 '11 at 16:48
1

You could use mb_detect_encoding to detect the encoding of the incoming data, then use iconv to translate the data into utf-8.

Josh
  • 10,961
  • 11
  • 65
  • 108
Craig Sefton
  • 903
  • 11
  • 20
  • So should I run this on every field in the POST request that is sent to my API? – Sebastian Hoitz May 04 '11 at 14:04
  • Yes, any data that needs to be converted to UTF-8. If you're sure that all data POSTed in one request will be of the same encoding (i.e. you don't expect a mix of ISO-8859-1 and Chinese characters in the same request), then detect encoding on one field, and use that to convert all of them. Should be easy to write a very basic function to do it for you. – Craig Sefton May 04 '11 at 14:09
0

You might want to check out iconv()

iconv — Convert string to requested character encoding

http://www.php.net/manual/en/function.iconv.php

Wesley Murch
  • 101,186
  • 37
  • 194
  • 228
0

I prefer mb_string functions. Here is the sample from php.net

/* Convert internal character encoding to SJIS */
$str = mb_convert_encoding($str, "SJIS");

/* Convert EUC-JP to UTF-7 */
$str = mb_convert_encoding($str, "UTF-7", "EUC-JP");

/* Auto detect encoding from JIS, eucjp-win, sjis-win, then convert str to UCS-2LE */
$str = mb_convert_encoding($str, "UCS-2LE", "JIS, eucjp-win, sjis-win");

/* "auto" is expanded to "ASCII,JIS,UTF-8,EUC-JP,SJIS" */
$str = mb_convert_encoding($str, "EUC-JP", "auto");
Damien
  • 674
  • 5
  • 12