Are IRIs valid as HTML attribute values?

Question

Is it valid HTML to use IRIs containing non-ASCII characters as attribute values (e.g. for href attributes) instead of URIs? Are there any differences among the HTML flavors (HTML and XHTML, 4 and 5)? At least RFC 3986 seems to imply that it isn't.

I realize that it would probably be safer (regarding older and IRI-unaware software) to use percent encoding, but I'm looking for a definitive answer with regards to the standard.

So far, I've done some tests with the W3C validator, and unescaped unicode characters in URIs don't trigger any warnings or errors with HTML 4/5 and XHTML 4/5 doctypes (but of course the absence of error messages doesn't imply the absence of errors).

At least chrome also supports raw UTF-8 IRIs, but percent-escapes them before firing an HTTP request. Also, my web server (lighttpd) seems to support UTF-8 characters in their percent-encoded as well as in unencoded form in an HTTP request.

Beware that [Section 1.2 of RFC 3987](http://tools.ietf.org/html/rfc3987#section-1.2) mentions that HTTP as defined by [RFC 2616](http://tools.ietf.org/html/rfc2616) does **NOT** support IRIs so handling them is outside of the standard. You (or your browser, or someone) need to map a given IRI to a URI first before trying to retrieve the referenced resource. — Oliver, Oct 25 '13 at 12:58
possible duplicate of [Unicode characters in URLs](http://stackoverflow.com/questions/2742852/unicode-characters-in-urls) — Ciro Santilli OurBigBook.com, Aug 29 '14 at 13:49

Alohci · Accepted Answer · 2012-12-29T02:32:29.513

HTML 4.01 is straightforward enough. Different attributes have different rules as to what they can contain, but if we're dealing with the href attribute on an <a> element, then the HTML 4 spec, section B.2.1 Non-ASCII characters in URI attribute values says:

... the following href value is illegal:

<A href="http://foo.org/Håkon">...</A>

HTML5 is different. It says IRIs are valid providing they comply with some additional conditions.

A URL is a valid URL if at least one of the following conditions holds:

The URL is a valid URI reference [RFC3986].

The URL is a valid IRI reference and it has no query component. [RFC3987]

The URL is a valid IRI reference and its query component contains no unescaped non-ASCII characters. [RFC3987]

The URL is a valid IRI reference and the character encoding of the URL's Document is UTF-8 or a UTF-16 encoding. [RFC3987]

XHTML 1.x follows the same rules as HTML 4.01.

XHTML5 is the same as HTML5.

score 3 · Answer 2 · edited Oct 07 '21 at 06:06

3

When in doubt, read the official HTML specs for definitive answers.

HTML 4 does not support IRIs at all. They must be encoded as URIs per RFC 3987 Section 3.1, or encode non-ASCII URI data as UTF-8 with percent encoding per HTML4 Section B.2.1

HTML 5 supports both URIs and IRIs in all places where URLs are allowed, per HTML5 Section 2.6.

edited Oct 07 '21 at 06:06

Community

1
1

answered Dec 29 '12 at 02:30

Remy Lebeau

555,201
31
458
770

Are IRIs valid as HTML attribute values?

2 Answers2

Linked