2

When doing something like json_encode($_SERVER) I get an error because the input to be JSON-encoded is not valid UTF-8. In fact, I looked into this error and noticed some user agent strings were encoded in ISO-8859-1. How do I know what encoding was used for the HTTP request, so that I can use utf8_encode() or iconv() as appropriate to be able to JSON-encode the data?

nicolagi
  • 1,289
  • 13
  • 18
  • 1
    HTTP Headers like `User-Agent` should be ASCII. The HTTP request and response body are encoded using the text encoding specified in the `charset` attribute of the `Content-Type` header. – EricLaw Nov 17 '14 at 17:46
  • Thank you @EricLaw. So if it's not ASCII it means the browser simply sent a bad request? In my case the user agent contains "OrangeEspaña" for instance. – nicolagi Nov 17 '14 at 17:50
  • Any header not using ASCII is foolish. Having said that, historically, RFC2616 allowed field content in the ASCII superset `ISO-8859-1`. Any other character set is supposed to be escaped using the scheme of RFC2231 (not commonly supported). Some clients will use UTF-8 or %-escaped UTF-8 in headers; the latter is preferable in most cases. – EricLaw Nov 17 '14 at 19:24

1 Answers1

1

From what I can tell, the standard doesn't say: http://www.w3.org/Protocols/rfc2616/rfc2616.html

A request body should have an encoding in a Content-Type header, but header values should be plain ascii. Anything beyond that isn't specified from what I can tell. I'm not sure that it is strictly wrong but there's apparently no standard you can use for a call to iconv.

What I would do is just loop through the string and remove any non-ASCII value. Maybe you could hack it out with a simple str_replace call or preg_replace remove non-ascii characters from string in php .

Community
  • 1
  • 1
Adam D. Ruppe
  • 25,382
  • 4
  • 41
  • 60