56

In HTTP you can specify in a request that your client can accept specific content in responses using the accept header, with values such as application/xml. The content type specification allows you to include parameters in the content type, such as charset=utf-8, indicating that you can accept content with a specified character set.

There is also the accept-charset header, which specifies the character encodings which are accepted by the client.

If both headers are specified and the accept header contains content types with the charset parameter, which should be considered the superior header by the server?

e.g.:

Accept: application/xml; q=1,
        text/plain; charset=ISO-8859-1; q=0.8
Accept-Charset: UTF-8

I've sent a few example requests to various servers using Fiddler to test how they respond:

Examples

W3

Request

GET http://www.w3.org/ HTTP/1.1
Host: www.w3.org
Accept: text/html;charset=UTF-8
Accept-Charset: ISO-8859-1

Response

Content-Type: text/html; charset=utf-8

Google

Request

GET http://www.google.co.uk/ HTTP/1.1
Host: www.google.co.uk
Accept: text/html;charset=UTF-8
Accept-Charset: ISO-8859-1

Response

Content-Type: text/html; charset=ISO-8859-1

StackOverflow

Request

GET http://stackoverflow.com/ HTTP/1.1
Host: stackoverflow.com
Accept: text/html;charset=UTF-8
Accept-Charset: ISO-8859-1

Response

Content-Type: text/html; charset=utf-8

Microsoft

Request

GET http://www.microsoft.com/ HTTP/1.1
Host: www.microsoft.com
Accept: text/html;charset=UTF-8
Accept-Charset: ISO-8859-1

Response

Content-Type: text/html

There doesn't seem to be any consensus around what the expected behaviour is. I am trying to look surprised.

Paul Turner
  • 38,949
  • 15
  • 102
  • 166
  • I think W3 is the only one of your example worth testing; all of the others appear to completely ignore the `Accept-Charset` header. – Sam Feb 13 '13 at 01:05
  • 2
    Perhaps you've mis-read: W3 and SO ignored the `Accept-Charset` header, Google honoured it and Microsoft pretended that text-encoding isn't a thing. – Paul Turner Feb 13 '13 at 08:18
  • 1
    I'm referring to the *current* behaviour of the four web servers. I tested each of them with different required character encodings and determined the following: W3 ignores the one in the `Accept` header, Google ignores both, SO ignores both, and Microsoft doesn't tell you the response's character set. I'm mentioning this because, for example, you implied that Google honoured your request, but that's just probably just a coincidence because Google always seems to return ISO-8859-1. None of the four web servers seems to do prioritisation nor process the charset in the `Accept` header. – Sam Feb 14 '13 at 02:31
  • @Tragedian, you should test with the opposites as well, that means include a new case ISO-8859-1 with `Accept`and utf-8 with `Accept-Charset`, for all your test cases. – Pacerier Jul 10 '13 at 09:45

6 Answers6

39

Altough you can set media type in Accept header, the charset parameter definition for that media type is not defined anywhere in RFC 2616 (but it is not forbidden, though).

Therefore if you are going to implement a HTTP 1.1 compliant server, you shall first look for Accept-charset header, and then search for your own parameters at Accept header.

Paulo
  • 1,041
  • 1
  • 7
  • 12
  • 1
    To clarify: you'd expect a server to give priority to the `Accept-Charset` header. If the header isn't present, then look for `charset` parameters in content types? – Paul Turner Sep 10 '11 at 06:20
  • 1
    Considering the RFC 2616 spec, that's the way it should work. But that solely depends on the way the developer that implemented the RFC interpreted it. If they do give priority to accept tag instead of accept-charset, they may be considering that accept tag just have priority (since is more specielized) over accept-charset (which is more generic, but is the standard). The final thought is, if you are making the request, make it consistent. If you are developing an HTTP server, build it in a way the RFC standard have more priority over your interpretation. This way you can’t be wrong. – Paulo Sep 13 '11 at 14:16
12

Read RFC 2616 Section 14.1 and 14.2. The Accept header does not allow you to specify a charset. You have to use the Accept-Charset header instead.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • 2
    RFC 2616 indicates it's acceptable for content-types to contain parameters. As per RFC 2046, `charset` is a specific parameter: http://tools.ietf.org/html/rfc2046#section-4.1.2 I don't see anything which specifically forbids this parameter in `Accept` or clarifies how to handle those parameters when they are used with `Accept-Charset` – Paul Turner Sep 05 '11 at 17:13
  • 1
    The `Accept` header does not use the `Content-Type` specification, so it is not correct to include `Content-Type` parameters in the `Accept` header. Please read the RFC syntax more carefully. The `Accept` header has its own kind of parameters (primarily for assigning priorities), and `charset` is not one of them. The `Accept-Charset` header is the official way to specify acceptable charsets. – Remy Lebeau Sep 05 '11 at 18:13
  • The RFC reads: "The media-range MAY include media type parameters that are applicable to that range." followed by "Each media-range MAY be followed by one or more accept-params, beginning with the "q" parameter for indicating a relative quality factor." It's quite clear that media-type parameters are acceptable. – Paul Turner Sep 06 '11 at 06:29
  • 2
    The use of a `charset` parameter in the `Accept` header is not defined anywhere, though. The `Accept-Charset` header is defined for that purpose. That is what servers are going to be looking for. – Remy Lebeau Sep 07 '11 at 01:36
  • 2
    charset is valid. See [RFC 2616 section 3.7](http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html), which specifically delegates to IANA and specifically states "The presence or absence of a parameter might be significant to the processing of a media-type, depending on its definition within the media type registry." – Tom Howard Mar 16 '12 at 00:46
  • For the record, Tomcat 8 with a basic Spring MVC application does not honor `Accept-Charset`, I had to include request header `Accept: application/json; charset=UTF-8` to switch response from default latin-1 to UTF-8, whereas my Linux system running both Tomcat JVM and `curl` client is fully setup for UTF-8. – Yves Martin Apr 04 '16 at 10:05
  • 2
    For what it's worth, `latin-1` is just plain wrong. According to [RFC 4627 Section 3](https://tools.ietf.org/html/rfc4627#section-3), JSON can only be Unicode, and [Section 6](https://tools.ietf.org/html/rfc4627#section-6) specifies that the MIME type is `application/json` with no parameters, optional or mandatory. – Mark Slater Nov 01 '16 at 22:04
  • Charset is valid. "Each media-range might be followed by zero or more applicable media type parameters (**e.g., charset**)" ([RFC 7231 §5.3.2. Accept](https://tools.ietf.org/html/rfc7231#section-5.3.2)). I'd say `Accept-Charset` is best used to avoid repetition in the `Accept`. Alas, given the complexities and confusion, it's understandable why seemingly many servers simply ignore this header, which is exactly what I am going to do in my HTTP server project. This choice is still spec-compliant as the server may treat "the resource as if it is not subject to content negotiation" (§5.3.3). – Martin Andersson Feb 04 '21 at 11:04
  • 1
    @MartinAndersson note that RFC 7231 didn't exist yet when this question was posted. – Remy Lebeau Feb 04 '21 at 15:20
8

Firstly, Accept headers can accept parameters, see RFC 7231 section 5.3.2

All text/* mime-types can accept a charset parameter.

The Accept-Charset header allows a user-agent to specify the charsets it supports.

If the Accept-Charset header did not exist, a user-agent would have to specify each charset parameter for each text/* media type it accepted, e.g.

Accept: text/html;charset=US-ASCII, text/html;charset=UTF-8, text/plain;charset=US-ASCII, text/plain;charset=UTF-8
Martin
  • 2,573
  • 28
  • 22
Malcolm Sparks
  • 106
  • 1
  • 4
5

RFC 7231 section 5.3.2 (Accept) clearly states:

Each media-range might be followed by zero or more applicable media type parameters (e.g., charset)

So a charset parameter for each content-type is allowed. In theory a client could accept, for example, text/html only in UTF-8 and text/plain only in US-ASCII.

But it would usually make more sense to state possible charsets in the Accept-Charset header as that applies to all types mentioned in the Accept header.

If those headers’ charsets don’t overlap, the server could send status 406 Not Acceptable.

However, I wouldn’t expect fancy cross-matching from a server for various reasons. It would make the server code more complicated (and therefore more error-prone) while in practice a client would rarely send such requests. Also nowadays I would expect everything server-side is using UTF-8 and sent as-is so there’s nothing to negotiate.

Community
  • 1
  • 1
Martin
  • 2,573
  • 28
  • 22
3

According to Mozilla Development Network, you should never use the Accept-Charset header. It's obsolete.

whistling_marmot
  • 3,561
  • 3
  • 25
  • 39
0

I don't think it matters. The client is doing something dumb; there doesn't need to be interoperability for that :-)

Julian Reschke
  • 40,156
  • 8
  • 95
  • 98
  • 2
    This might be the smart answer, but what happened to being liberal in what we accept? Specifically, if a client wants to indicate that they'd like a text representation in ISO encoding and XML in Unicode encoding, the `charset` parameter is the only way to be that explicit. – Paul Turner Sep 06 '11 at 06:30
  • Yes, I didn't say otherwise. The client is doing something *dumb* when sending *conflicting* information. – Julian Reschke Sep 09 '11 at 08:59
  • 2
    Whether they're conflicting surely is down to how you choose to interpret the values? You could, for example, say that the `charset` parameter is superior to the `Accept-Charset` header. – Paul Turner Sep 09 '11 at 14:31