58

For pages already specified (either by HTTP header, or by meta tag), to have a Content-Type with a UTF-8 charset... is there a benefit of adding accept-charset="UTF-8" to HTML forms?

(I understand the accept-charset attribute is broken in IE for ISO-8859-1, but I haven't heard of a problem with IE and UTF-8. I'm just asking if there's a benefit to adding it with UTF-8, to help prevent invalid byte sequences from being entered.)

philfreo
  • 41,941
  • 26
  • 128
  • 141
  • My question is more specific... but related: http://stackoverflow.com/questions/3715264/how-to-handle-user-input-of-invalid-utf-8-characters and http://stackoverflow.com/questions/1317152/am-i-correctly-supporting-utf-8-in-my-php-apps/1317301#1317301 – philfreo Sep 15 '10 at 17:04
  • Related W3C reference: http://www.w3.org/TR/html401/interact/forms.html#adef-accept-charset (note the "may" in `User agents may interpret this value as the character encoding that was used to transmit the document` - does this mean it's safer to explicitly mention it? Not sure. From my experience, I agree with what @elusive says) – Pekka Sep 15 '10 at 17:15

2 Answers2

43

If the page is already interpreted by the browser as being UTF-8, setting accept-charset="utf-8" does nothing.

If you set the encoding of the page to UTF-8 in a <meta> and/or HTTP header, it will be interpreted as UTF-8, unless the user deliberately goes to the View->Encoding menu and selects a different encoding, overriding the one you specified.

In that case, accept-encoding would have the effect of setting the submission encoding back to UTF-8 in the face of the user messing about with the page encoding. However, this still won't work in IE, due the previous problems discussed with accept-encoding in that browser.

So it's IMO doubtful whether it's worth including accept-charset to fix the case where a non-IE user has deliberately sabotaged the page encoding (possibly messing up more on your page than just the form).

Personally, I don't bother.

Darryl Hein
  • 142,451
  • 95
  • 218
  • 261
bobince
  • 528,062
  • 107
  • 651
  • 834
  • 1
    Are you sure? That makes sense but the doc says `may interpret` and that the default is UNKNOWN. – philfreo Sep 16 '10 at 02:52
  • 6
    On all browsers (now and historically), `UNKNOWN`/unset always means the current page encoding, whether that was the server's page encoding set in a header/meta, or the encoding explicitly set by the user as an override. Exception that probably doesn't affect you: most browsers will not send form submissions in a non-ASCII-superset encoding like UTF-16 even if the page was served as that. It doesn't really make sense to do so. – bobince Sep 16 '10 at 08:06
3

I did not encounter any problems using UTF-8 with IE (6+) or any other major browser out there. You need to make sure, that a UTF-8 meta tag is set (IE needs this) and that all your files are UTF-8 encoded (which means that the webserver sends UTF-8 headers). Then there should not be any problem if you omit accept-charset.

jwueller
  • 30,582
  • 4
  • 66
  • 70
  • I'm doing those things, sans that form attribute, I'm getting some cases of invalid UTF-8 being input (http://stackoverflow.com/questions/3715264/how-to-handle-user-input-of-invalid-utf-8-characters), so I'm trying to find out conclusively if adding this to all my forms will be helpful or unnecessary. – philfreo Sep 15 '10 at 17:24
  • @philfreo: I never used it once and had no problems at all. Can you hand us a link to your page? – jwueller Sep 15 '10 at 17:41
  • 2
    If your page is really being properly served as UTF-8, you shouldn't get non-UTF-8 submissions from that form. Of course, if you've got other sites embedding a form that points to your site, or automated agents submitting content in general, all bets are off. – bobince Sep 15 '10 at 22:47
  • Our server is serving all pages as UTF-8, and we aren't (intentionally) receiving data from other sources. We aren't getting a lot of invalid UTF-8, but we do get some every once in a while. As my other question indicates, looking for an overall approach to solving that. This question I was hoping to hear conclusively whether the `accept-charset` attribute was necessary (made any difference) given a UTF-8 http header. – philfreo Sep 15 '10 at 23:50