89

In HTML attribute name=value pairs, what are the characters allowed for the 'name' portion? ..... Looking at some common attributes it appears that only letters (a-z and A-Z) are used, but what other chars could be allowed as well?... maybe digits (0-9), hyphens (-), and periods (.) ... is there any spec for this?

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
Robin Rodricks
  • 110,798
  • 141
  • 398
  • 607

5 Answers5

67

It depends what you mean by "allowed". Each tag has a fixed list of attribute names which are valid, and in html they are case insensitive. In one important sense, only these characters in the correct sequence are "allowed".

Another way of looking at it, is what characters will browsers treat as a valid attribute name. The best advice here comes from the parser spec of HTML 5, which can be found here: https://html.spec.whatwg.org/multipage/syntax.html#attributes-2

It says that all characters except tab, line feed, form feed, space, solidus, greater than sign, quotation mark, apostrophe and equals sign will be treated as part of the attribute name. Personally, I wouldn't attempt pushing the edge cases of this though.

waldyrious
  • 3,683
  • 4
  • 33
  • 41
Alohci
  • 78,296
  • 16
  • 112
  • 156
  • 7
    Answer my question. "all characters except ... will be treated as part of the attribute name" -- Kudos on finding this info, that too in a spec! – Robin Rodricks May 30 '09 at 05:27
  • Yeah don't "push it". Some rather common characters will have to be escaped in CSS selectors, others will break the syntax highlighting of your editor, etc. – Rolf Dec 02 '13 at 00:19
  • 34
    For reference, the regex would be `/([^\t\n\f \/>"'=]+)/` – Nate Mar 04 '14 at 22:06
  • 8
    *“Each tag has a fixed list of attribute names which are valid”* – unless the tag is a [custom element](http://www.html5rocks.com/en/tutorials/webcomponents/customelements/). Then you can define attributes yourself. – tomekwi Apr 11 '15 at 22:04
  • @Nate I would say that your suggested regex is missing the exception for the [Control Characters](https://en.wikipedia.org/wiki/Control_character#In_Unicode) and also the `NULL` value. I came up with the following regex for the exceptions `[ \u0000-\u001F\u007F\u0080—\u009F"'>\/=]` ([link](https://regex101.com/r/uU4vT6/3)) but I am missing the "any characters that are not defined by Unicode" from [the spec](https://html.spec.whatwg.org/multipage/syntax.html#attributes-2). Any suggestion how to easily validate this part? – PauloASilva Jun 22 '16 at 10:23
  • It would probably make sense to switch to a positive matching regex instead of a negative one. In other words, use `[...]` instead of `[^...]` though I don't have the time at the moment to put one together. (I see in the link you are using a negative match but you use a positive one in your comment. A typo?) – Nate Jun 22 '16 at 12:51
  • A negative match is faster if done right. I suggest the following PCRE `/^[^ "'>\/=\p{Cc}]++$/Du`. Note that this will validate Unicode as well, which should suffice to satisfy the _any characters that are not defined by Unicode_ constraint for most use cases. – Fleshgrinder Sep 05 '16 at 14:42
  • 4
    Small hint for fellow programmers. If you have `data-foo` as attribute name, you'll have trouble with the JavaScript `MyElem.data-foo;` Use `MyElem.getAttribute("data-foo");` – manuell Jan 30 '18 at 15:19
  • It isn't really possible to properly represent Unicode in HTML attributes. Which Unicode representation? UTF-8? UTF-32 with a BOM? What about HTML entities? How do you handle non-Unicode charsets? In addition, `data-*` attribute names further restricts allowed characters to the XML-compatible character set, which, again, makes difficult/impossible assumptions about Unicode representations in HTML. Also, #b7 has no valid use-case. So accounting for all this, the only allowed characters for an attribute name should be: `[a-z][a-z0-9_.-]*` Anything beyond that is highly suspect as junk. – CubicleSoft Feb 09 '18 at 15:11
  • I found out that commas are not supported in attribute names in Chrome. – Mikaël Mayer May 15 '18 at 19:08
  • @MikaëlMayer - Can you provide evidence for that? Using a comma in an attribute name works for me in Chrome. – Alohci May 15 '18 at 21:33
  • d = document.createElement("div"); d.setAttribute(",", "hello") – Mikaël Mayer May 16 '18 at 18:14
  • @MikaëlMayer - Ah OK. Yes, that's a DOM restriction, not an HTML restriction. For setAttribute(), the attribute name must match the `Name` production, as given in S.Lott's answer. – Alohci May 16 '18 at 21:44
  • @CubicleSoft With XML/HTML, the document text encoding defines how the remainder of the byte-level data is treated, including element tags and their names. If unspecified by the `?xml` or `meta charset`, the default is UTF-8, no BOM. This of course still precludes any characters explicitly disallowed by spec. I believe HTML element/attribute names must begin with an alphabetic character and only contain alphanumerics and "_" "-". So maybe `[\w][\w\d_-]*`. But where are people getting the `, .` etc. from? I've never seen that in any HTML spec, and symbols in identifiers are usually a bit :-/ – Beejor May 24 '19 at 23:56
  • 2
    @manuell No, MyElem.dataFoo will work fine. Dashed entries are automatically changed to camelCase entries. – PRMan Apr 13 '22 at 17:32
  • @PRMan Why do you say "No"? *Yes*, `MyElem.data-foo` will not work! Imho the dash-style to camelCase conversion rule is bonker. Another way to use data-* attributes is with `MyElem.dataset.foo`... – manuell Apr 15 '22 at 08:48
36

Since this question was asked, the web has evolved quite a bit. It's likely that authors of Web Components (custom elements) are landing here trying to learn what valid names can be used when defining attributes on custom elements.

There are several answers here that are partially correct, so I'm going to try to aggregate them and update them based on recent specs.

First, in HTML5, attribute names can start with most characters and are much more permissive than in previous versions of HTML. @S.Lott 's answer is correct for HTML 2 and XHTML, but not for HTML5.

For HTML5: (spec)

Attribute names must consist of one or more characters other than the space characters, U+0000 NULL, U+0022 QUOTATION MARK ("), U+0027 APOSTROPHE ('), U+003E GREATER-THAN SIGN (>), U+002F SOLIDUS (/), and U+003D EQUALS SIGN (=) characters, the control characters, and any characters that are not defined by Unicode. In the HTML syntax, attribute names, even those for foreign elements, may be written with any mix of lower- and uppercase letters that are an ASCII case-insensitive match for the attribute's name.

That being said, other commenters here are correct, when using an attribute on a built-in element that's not in it's list of valid attributes, you're technically violating the spec. Browser authors have a lot of tolerance for this though, so in practice it doesn't do (much?) harm. A lot of libraries exploit this to enhance regular HTML tags, which causes some confusion, since it's technically not valid HTML. HTML5 provides a mechanism for custom data in attributes by using the data- attribute naming convention.

These rules are different for custom elements.

Custom element authors are welcome to implement any sort of attribute they like to their element, the names of the attributes are more restrictive than HTML5 though. In fact, the spec requires that the attribute name follow the XML Name restrictions:

The ASCII symbols and punctuation marks, along with a fairly large group of Unicode symbol characters, are excluded from names because they are more useful as delimiters in contexts where XML names are used outside XML documents; providing this group gives those contexts hard guarantees about what cannot be part of an XML name. The character #x037E, GREEK QUESTION MARK, is excluded because when normalized it becomes a semicolon, which could change the meaning of entity references.

Names and Tokens

[4] NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]

[4a] NameChar ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]

[5] Name ::= NameStartChar (NameChar)*

[6] Names ::= Name (#x20 Name)*

[7] Nmtoken ::= (NameChar)+

[8] Nmtokens ::= Nmtoken (#x20 Nmtoken)*

So, for custom element names you can use upper/lower alphanumeric, "_" underscore, ":" colon, or any of the unicode characters called out in the spec, as a start character, then dashes "-", dots ".", alpha etc... as body characters.

Clayton Gulick
  • 9,755
  • 2
  • 36
  • 26
  • 2
    Great work. In 2022, this remains the up-to-date answer which should display at the top of the list. It's not ideal that two answers from 2009 outrank this answer from 2018. – Rounin Jun 15 '22 at 17:41
31

Assuming you're talking about XHTML, the XML rules apply.

See http://www.w3.org/TR/2008/REC-xml-20081126/#NT-Name

Names and Tokens

[4]     NameStartChar      ::=      ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
[4a]    NameChar       ::=      NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]
[5]     Name       ::=      NameStartChar (NameChar)*
[6]     Names      ::=      Name (#x20 Name)*
[7]     Nmtoken    ::=      (NameChar)+
[8]     Nmtokens       ::=      Nmtoken (#x20 Nmtoken)*
S.Lott
  • 384,516
  • 81
  • 508
  • 779
  • Btw not all these rules work in browser. Try `document.body.setAttribute('\u1fff', 1)` - that will error. – dy_ Apr 02 '20 at 03:44
8

Maybe I'm missing something, but I believe the question is based on a false assumption. In HTML, attributes are strictly defined according to a fixed specification. If you 'make up' your own attribute names, you are no longer writing valid HTML.

Daan
  • 6,952
  • 4
  • 29
  • 36
0

The values allowed are listed at w3.org. If you add a custom attribute, then you aren't writing HTML any more.

bluish
  • 26,356
  • 27
  • 122
  • 180
Quentin
  • 914,110
  • 126
  • 1,211
  • 1,335