0

When an HTTP client sends a request to a webserver, looking like this:

GET /index.html HTTP/1.1
Host: www.example.com

and the server responds with simething like this: (examples taken from Wikipedia)

HTTP/1.1 200 OK
Date: Mon, 23 May 2005 22:38:34 GMT
Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux)
Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
ETag: "3f80f-1b6-3e1cb03b"
Content-Type: text/html; charset=UTF-8
Content-Length: 131
Accept-Ranges: bytes
Connection: close

<html>
<head>
  <title>An Example Page</title>
</head>
<body>
  Hello World, this is a very simple HTML document.
</body>
</html>

The response contains the field Content-Type: text/html; charset=UTF-8. But this only tells the encoding of the bytes after the empty line in the response.

What charset do the request and the response header (everything before the empty line) have? Are they ASCII or UTF-8 or any other charset?

MinecraftShamrock
  • 3,504
  • 2
  • 25
  • 44

2 Answers2

1

It seems to be a little bit complicated but the base line is that headers must be ASCII.

Sending non-ASCII text in Http POST header

HTTP headers encoding/decoding in Java

Community
  • 1
  • 1
Halcyon
  • 57,230
  • 10
  • 89
  • 128
1

This used to be defined (somewhat vaguely) in RFC 2616. However, last summer RFC 2616 was replaced by a series of RFCs starting from RFC 7230, ”Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing”. It answers the question more realistically, but still somewhat vaguely, in clause 3.2.4:

Historically, HTTP has allowed field content with text in the
ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
through use of [RFC2047] encoding. In practice, most HTTP header
field values use only a subset of the US-ASCII charset [USASCII].
Newly defined header fields SHOULD limit their field values to
US-ASCII octets. A recipient SHOULD treat other octets in field
content (obs-text) as opaque data.

The characters in the basic syntax of HTTP are ASCII characters of course. Some headers may contain other data. The character restrictions and interpretation of bytes is defined for each header. The basic definitions are in RFC 7231. For most headers, their explicit syntax restricts the characters to ASCII. Even in comments, RFC 7231 allows non-ASCII bytes only as an obsolete feature.

Community
  • 1
  • 1
Jukka K. Korpela
  • 195,524
  • 37
  • 270
  • 390