11

I am using accept-charset="utf-8" attribute in form and found that the when do a form post with non-ascii, the headers have different accept charset option in the request header. Is there anything i am missing ? My form looks like this

<form method="post" action="controller" accept-charset="UTF-8">
..input text box
.. submit button
</form>

Thanks in advance

insomiac
  • 5,648
  • 8
  • 45
  • 73
  • What user agents have you tried? Did you look at [any of the related links](http://stackoverflow.com/questions/3719974/is-there-any-benefit-to-adding-accept-charset-utf-8-to-html-forms-if-the-page)? – Dave L. Oct 11 '12 at 00:48

1 Answers1

10

The question, as asked, is self-contradictory: the heading says that the accept-charset parameter does not do anything, whereas the question body says that when the accept-charset attribute (this is the correct term) is used, “the headers have different accept charset option in the request header”. I suppose a negation is missing from the latter statement.

Browsers send Accept-Charset parameters in HTTP request headers according to their own principles and settings. For example, my Chrome sends Accept-Charset:windows-1252,utf-8;q=0.7,*;q=0.3. Such a header is typically ignored by server-side software, but it could be used (and it was designed to be used) to determine which encoding is to be used in the server response, in case the server-side software (a form handler, in this case) is capable of using different encodings in the response.

The accept-charset attribute in a form element is not expected to affect HTTP request headers, and it does not. It is meant to specify the character encoding to be used for the form data in the request, and this is what it actually does. The HTML 4.01 spec is obscure about this, but the W3C HTML5 draft puts it much better, though for some odd reason uses plural: “gives the character encodings that are to be used for the submission”. I suppose the reason is that you could specify alternate encodings, to prepare for situations where a browser is unable to use your preferred encoding. And what actually happens in Chrome for example is that if you use accept-charset="foobar utt-8", then UTF-8 used.

In practice, the attribute is used to make the encoding of data submission different from the encoding of the page containing the form. Suppose your page is ISO-8859-1 encoded and someone types Greek or Hebrew letters into your form. Browsers will have to do some error recovery, since those characters cannot be represented in ISO-8859-1. (In practice they turn the characters to numeric character references, which is logically all wrong but pragmatically perhaps the best they can do.) Using <form charset=utf-8> helps here: no matter what the encoding is, the form data will be sent as UTF-8 encoding, which can handle any character.

If you wish to tell the form handler which encoding it should use in its response, then you can add a hidden (or non-hidden) field into the form for that.

Jukka K. Korpela
  • 195,524
  • 37
  • 270
  • 390
  • Thanks for the answer its helpful. Do you know how can i set accept-charset default as UTF-8? – insomiac Oct 11 '12 at 22:10
  • The default for `accept-charset` is `UNKNOWN` as per HTML 4.01, but HTML5 drafts reflect the reality better: the default is the document’s character encoding. If you mean setting a default to be used in your authoring software, then it all depends on that software. – Jukka K. Korpela Oct 12 '12 at 05:55
  • 2
    What's the version of your Chrome? Accept-Charset is obsolete, you should not depend on it anymore:https://code.google.com/p/chromium/issues/detail?id=112804 – Y.L. Jul 02 '14 at 11:59
  • @CyberRusher, this is still relevant in Chrome 35. The `accept-charset` HTML attribute is distinct from the `Accept-Charset` HTTP header (which is what your linked document discusses). – Jukka K. Korpela Jul 02 '14 at 12:18
  • @JukkaK.Korpela, Doesn't the accept-charset HTML attribute controls the Accept-Charset HTTP header? My Chrome version is "35.0.1916.153 m", but there is no any Accept-Char in the request header. – Y.L. Jul 03 '14 at 03:35
  • @CyberRusher, no. As I mention in my answer: “The `accept-charset` attribute in a `form` element is not expected to affect HTTP request headers, and it does not. ” – Jukka K. Korpela Jul 03 '14 at 04:21
  • @JukkaK.Korpela, How could I set my chrome to let it send Accept-Character in HTTP request? could you give me some tips? thank you! – Y.L. Jul 03 '14 at 07:41
  • @CyberRusher, that would be a browser configuration issue and off-topic (for this question and for SO), and I'm afraid it can’t be done. – Jukka K. Korpela Jul 03 '14 at 08:14
  • FYI: The definition of HTTP -headers- and get-requests content is: "MUST BE ISO-8859-1". preferably ascii. Anything with a different charset can only be done in a body (including POST). So always expect the request headers to be ISO. However setting the form charset permits the post body to contain utf8 despite headers. Each browser has different responses on how they handle the post body characters. Few browsers send the request header '~post content is utf8~'. If you set it on the form, you have to expect it on processing – ppostma1 Mar 11 '15 at 20:58