24

I am novice to HTTP related matters. My question is in iOS development, I would like to send a string using HTTP Header, so I'm using:

[httpRequest setValue:@"nonEnglishString" forHTTPHeaderField:@"customHeader"];

The receiving server is Python(Google App Engine), saving the string value in the db model as StringProperty using:

dataEntityInstance.nonEnglishString = unicode(self.request.headers.get('customHeader')

However, the problem is when I try to send non-English string like Korean, it's saved in HTTP header like this:

Customheader = "\Uc8fc\Uba39\Uc774 \Uc6b4\Ub2e4";

and when it's received by Google App Engine and saved in DataStore, it's changed to be like:

??? ??

as if it can't find the proper characters for the unicode value.

Is it not POSSIBLE or ALLOWED to send non-English string using HTTP Header?

If my iOS uses just setHTTPBody, it can transfer non-English strings and save to App Engine's DataStore properly.

[httpRequest setHTTPBody:[httpBody dataUsingEncoding:NSUTF8StringEncoding]];

But I just can't find the right way to achieve same goal using HTTP Headers, like what many APIs like Foursquare's do and saving the strings in the proper forms in Python based Google App Engine's DataStore

petershine
  • 3,190
  • 1
  • 25
  • 49

3 Answers3

29

Is it not POSSIBLE or ALLOWED to send non-English string using HTTP Header?

It's not possible as per HTTP standards to put non-ISO-8859-1 characters directly in an HTTP header. That gives you ASCII ("English"?) characters plus common Western European diacriticals.

However in practice you can't even use the extended ISO-8859-1 characters, because servers and browsers don't agree on what to do with non-ASCII characters in headers. Safari takes RFC2616 at its word and treats high bytes as ISO-8859-1 characters; Mozilla takes UTF-16 code unit low bytes, which is similar but weirder; Opera and Chrome decode from UTF-8; IE uses the local system code page.

So in reality all you can put in an HTTP header is simple ASCII with no control codes. If you want anything more, you'll have to come up with an encoding scheme (eg UTF-8+base64). The RFC2616 standard suggests RFC2047 encoded-words as a standard form of encoding, but this makes no sense given the definitions of when they are allowable in RFC2047 itself, and nothing supports it.

bobince
  • 528,062
  • 107
  • 651
  • 834
  • What do you mean by *(…) this makes no sense given the definitions of when they are allowable in RFC2047 itself (…)*? – Piotr Dobrogost Oct 04 '17 at 13:34
  • RFC 2047 section 5 states that encoded-words can go where RFC 822 ‘text’, ‘comment’ and ‘phrase’ go, but RFC 2616 isn't an RFC 822-family standard and doesn't have tokens that match those. (There is a TEXT token but it's not defined the same.) It explicitly states that they mustn't go in a ‘quoted-string’; there is an very similar ‘quoted-string’ token defined in RFC 2616 and that's the one place where you most want to put non-ASCII characters in practice (because of Content-Disposition and similar parameterised headers). – bobince Oct 12 '17 at 21:16
  • Anyhow, from a standards perspective this is cleared up now: RFC 5987 provides for a standard way to encode non-ASCII in parameterised headers, and RFC 7230 recommends non-legacy headers be ASCII. – bobince Oct 12 '17 at 21:25
5

It is possible to use character sets other than ISO 8859-1 in HTTP headers, but they must be encoded as described in RFC 2047.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • If it means the client, my iOS app, should do encoding the NSString in RFC 2047, before being set as a HTTP Header value, would you like to provide me where I can find iOS or Objective-C source to handle this task? It seems too hard to find the solution – petershine Mar 24 '11 at 17:52
  • 1
    This prescription in RFC2616 is, unfortunately, bogus. RFC2047 encoded-words explicitly cannot go in any of the places you might want to use them in HTTP header, since they're not based on RFC-822-family atoms. The reference to 2047 has been removed from HTTPbis work on future standards. You can of course use encoded-words as an ad hoc application-specific form of encoding, if you want (but straight UTF-8 base64 might be easier). – bobince Mar 24 '11 at 22:56
0

RFC 8187 describes the way you could pass header value in different encoding:

Extended notation, using the Unicode character U+00A3 ("£", POUND SIGN):

     foo: bar; title*=utf-8'en'%C2%A3%20rates
Alexander Goldabin
  • 1,824
  • 15
  • 17