275

Can subdomains (domain names) have underscore _ in them?

unor
  • 92,415
  • 26
  • 211
  • 360
Daniel Kivatinos
  • 24,088
  • 23
  • 61
  • 81
  • 16
    I have taken your question litterally: that you really meant DOMAIN NAMES. If,instead, you meant HOST NAMES, edit your question, because the answer will be different. – bortzmeyer Feb 02 '10 at 12:52
  • 2
    "domain name" is an ambiguous term. What it means vary when used in a DNS settings vs when viewed in a "registration" setting, aka when you are about to register a given name. You can't register a domain name with an underscore because in the registration plane a domain name is in fact more an hostname in the DNS terminology and hence more restrictive in allowed characters (but then there are IDNs that allow characters outside of ASCII...). As a domain name in the DNS sense of it, any character is allowed. – Patrick Mevzek Feb 17 '22 at 15:10

12 Answers12

457

Most answers given here are false. It is perfectly legal to have an underscore in a domain name. Let me quote the standard, RFC 2181, section 11, "Name syntax":

The DNS itself places only one restriction on the particular labels that can be used to identify resource records. That one restriction relates to the length of the label and the full name. [...] Implementations of the DNS protocols must not place any restrictions on the labels that can be used. In particular, DNS servers must not refuse to serve a zone because it contains labels that might not be acceptable to some DNS client programs.

See also the original DNS specification, RFC 1034, section 3.5 "Preferred name syntax" but read it carefully.

Domains with underscores are very common in the wild. Check _jabber._tcp.gmail.com or _sip._udp.apnic.net.

Other RFC mentioned here deal with different things. The original question was for domain names. If the question is for host names (or for URLs, which include a host name), then this is different, the relevant standard is RFC 1123, section 2.1 "Host Names and Numbers" which limits host names to letters-digits-hyphen.

Community
  • 1
  • 1
bortzmeyer
  • 34,164
  • 12
  • 67
  • 91
  • 114
    +1 for the difference between "domain names" and "host names" – Alnitak Feb 02 '10 at 10:03
  • 3
    The question (unless it was edited) is about subdomains ie. hostnames. You're not wrong about your factual statements, except pointing out that answers are false, based on how the question is currently worded. – redreinard Apr 11 '14 at 02:36
  • 8
    I'm confused, 1034 says "The labels must follow the rules for ARPANET host names. They must start with a letter, end with a letter or digit, and have as interior characters only letters, digits, and hyphen." Which part of that allows an underscore? – claudekennilol Sep 16 '16 at 19:01
  • 4
    The wording is confusing. URLs can't have underscores. A URL is always a FQDN, it's not a host name. A FQDN can have an empty host name, in this case FQDN = domain. `_jabber._tcp.gmail.com` is not a domain, it's a FQDN. Because URLs can't have underscore in them, you'll probably never be able to buy a domain with an underscore in it. So, even tho domains could also have underscores from a DNS syntax point of view, you will never encounter any, unless it's a local one. – Capsule Mar 17 '17 at 01:11
  • 1
    I can't see the quote in 2.1 of rfc1123 that mentions anything about hyphens being allowed. I can see in the rfc952 that a name can be . Is that what you were referring to? – AJP Apr 02 '17 at 15:59
  • 2
    "The labels must follow the rules for ARPANET host names." in RFC 1034 is in section 3.5 Preferred name syntax. These were preferences not requirements of the DNS. That section begins "The DNS specifications attempt to be as general as possible." and section 3.2 ends with "Node labels which use special characters, leading digits, etc., are likely to break older software which depends on more restrictive choices." - because arbitrary octet strings are allowed, the only restrictions being 0 to 63 octets per label and 255 octets total in a name, including label lengths. – Ian May 07 '17 at 19:25
  • Is this answer still a valid conclusion given the ballot linked in [this answer](https://stackoverflow.com/a/54670381)? The general impression I get is that the practice is now strongly discouraged. – jlmt Jun 15 '19 at 09:50
  • 2
    @Alnitak I'm very confused, **websites domains are HOSTNAMES or DOMAINNAMES ?** Can you give me only 1 example of a domain name starts with `_` that I can test in my browser ? – Accountant م Sep 03 '19 at 10:16
  • 2
    @Accountantم the "domain name" portion of a URL is used to look up an `A` or `AAAA` record. It must therefore be further restricted to the set of legal _hostnames_ (since A and AAAA records point to _hosts_). – Alnitak Sep 03 '19 at 12:48
  • @Alnitak Yeeees, thank you very much now I got it. I asked you because I trust you, and thanks also for your [old help](https://stackoverflow.com/questions/37093965/how-to-prevent-the-browser-from-requesting-the-assets-come-from-the-ajax-respons#comment61731834_37094048) before, I remember your name. Have a good day – Accountant م Sep 03 '19 at 13:28
  • If underscore is valid in domain name then what is regular expression in javascript to validate an email address which has 320 characters (64 characters before the @ symbol, 1 character for @ symbol and rest 255 characters for domain name with underscore.) Please share your solution with me. Thanks a lot in advance. – Kamlesh Sep 23 '19 at 20:11
  • 1
    @Kamlesh Comments are for discussing the answer, not asking additional questions. If you're looking for someone to write your code for you I suggest you contact a consulting agency to find an appropriate contractor for your use-case. – augurar Nov 06 '19 at 00:42
  • A bit more clarification: an URL not needed to contain an FQDN or hostname at all, it highly depends on the scheme. For example, a mailto: URL is a valid URL and will never have any hostname. DNS hostname could contain underscores but hostnames in URL that have a scheme that follows CISS specification should not. It's because URL FQDNs not necessarily DNS names, it's a very common misconception. URL RFC is completely separated from DNS and has no relation to it at all. We connect it because URLs are mostly HTTP URLs and HTTP URLs contain DNS names. – Gabor Garami May 06 '20 at 16:12
  • A distinction between DNS and X.509 certs which protect HTTPS sites. While DNS permits underscores in names, X.509 certs can no longer contain underscores (see below). If your site has underscores and you attempt to get an SSL/TLS certificate you will have problems. Best advice is to not use underscores for HTTP(s) related DNS entries. – DavidG May 22 '20 at 13:36
114

A note on terminology, in furtherance to Bortzmeyer's answer

One should be clear about definitions. As used here:

  • domain name is the identifier of a resource in a DNS database
  • label is the part of a domain name in between dots
  • hostname is a special type of domain name which identifies Internet hosts

The hostname is subject to the restrictions of RFC 952 and the slight relaxation of RFC 1123

RFC 2181 makes clear that there is a difference between a domain name and a hostname:

...[the fact that] any binary label can have an MX record does not imply that any binary name can be used as the host part of an e-mail address...

So underscores in hostnames are a no-no, underscores in domain names are a-ok.

In practice, one may well see hostnames with underscores. As the Robustness Principle says: "Be conservative in what you send, liberal in what you accept".

A note on encoding

In the 21st century, it turns out that hostnames as well as domain names may be internationalized! This means resorting to encodings in case of labels that contain characters that are outside the allowed set.

In particular, it allows one to encode the _ in hostnames (Update 2017-07: This is doubtful, see comments. The _ still cannot be used in hostnames. Indeed, it cannot even be used in internationalized labels.)

The first RFC for internationalization was RFC 3490 of March 2003, "Internationalizing Domain Names in Applications (IDNA)". Today, we have:

  • RFC 5890 "IDNA: Definitions and Document Framework"
  • RFC 5891 "IDNA: Protocol"
  • RFC 5892 "The Unicode Code Points and IDNA"
  • RFC 5893 "Right-to-Left Scripts for IDNA"
  • RFC 5894 "IDNA: Background, Explanation, and Rationale"
  • RFC 5895 "Mapping Characters for IDNA 2008"

You may also want to check the Wikipedia Entry

RFC 5890 introduces the term LDH (Letter-Digit-Hypen) label for labels used in hostnames and says:

This is the classical label form used, albeit with some additional restrictions, in hostnames (RFC 952). Its syntax is identical to that described as the "preferred name syntax" in Section 3.5 of RFC 1034 as modified by RFC 1123. Briefly, it is a string consisting of ASCII letters, digits, and the hyphen with the further restriction that the hyphen cannot appear at the beginning or end of the string. Like all DNS labels, its total length must not exceed 63 octets.

Going back to simpler times, this Internet draft is an early proposal for hostname internationalization. Hostnames with international characters may be encoded using, for example, 'RACE' encoding.

The author of the 'RACE encoding' proposal notes:

According to RFC 1035, host parts must be case-insensitive, start and end with a letter or digit, and contain only letters, digits, and the hyphen character ("-"). This, of course, excludes any internationalized characters, as well as many other characters in the ASCII character repertoire. Further, domain name parts must be 63 octets or shorter in length.... All post-converted name parts that contain internationalized characters begin with the string "bq--". (...) The string "bq--" was chosen because it is extremely unlikely to exist in host parts before this specification was produced.

Community
  • 1
  • 1
David Tonhofer
  • 14,559
  • 5
  • 55
  • 51
  • 1
    On a side note, "Systems such as DomainKeys and service records use the underscore as a means to assure that their special character is not confused with hostnames. For example, _http._sctp.www.example.com specifies a service pointer for an SCTP capable webserver host (www) in the domain example.com." ([link](https://en.wikipedia.org/wiki/Hostname#Restrictions_on_valid_host_names)) – x-yuri Jul 20 '15 at 17:22
  • Ignore the RACE encoding portions, IDN already set the internaitonlized character convert to ASCII by using 'xn--' prefix. – mootmoot Apr 04 '17 at 15:56
  • So above it was suggested that the host name is also what some refer to as the subdomain. Is this the case or are they separate? I.E - `my-subdomain.google.com` I would reference the `my-subdomain` part as a subdomain. But above it appears these are host name. Could you confirm or clarify, please? – Nelda.techspiress Jun 06 '17 at 02:11
  • 2
    @Nelda.techspiress It's been some time but according to [RFC 1034: Domain Names - Concepts and Facilities](https://tools.ietf.org/html/rfc1034), what is called a "subdomain" of a domain `bar.baz.` (for example) is just the collection of domain names that are hierarchically underneath `bar.baz.`, e.g. `a.bar.baz.`, `f.g.bar.baz.`, `h.bar.baz.`, etc. This "subdomain" may or may not include actual _hostnames_. – David Tonhofer Jun 06 '17 at 19:42
  • 2
    In daily usage, one may tend to incorrectly call the string `a.bar.baz` (a domain name) "a subdomain of" the string `bar.baz` (another domain name). The domain names (DNS database resources) `a.bar.baz` and `bar.baz` may or may not be _hostnames_. – David Tonhofer Jun 06 '17 at 19:42
  • 1
    On [page 8 of RFC 1034](https://tools.ietf.org/html/rfc1034#page-8), we read: _A domain is identified by a domain name, and consists of that part of the domain name space that is at or below the domain name which specifies the domain. A domain is a subdomain of another domain if it is contained within that domain. This relationship can be tested by seeing if the subdomain's name ends with the containing domain's name. For example, A.B.C.D is a subdomain of B.C.D, C.D, D, and " "._ – David Tonhofer Jun 06 '17 at 19:43
  • 1
    [RFC 5892](https://tools.ietf.org/html/rfc5892) "The Unicode Code Points and IDNA" does **NOT** list `_` (U+005F) as an allowed code point. Not sure why @DavidTonhofer claims that it does. – Maxim Vladimirsky Jul 06 '17 at 23:04
  • 1
    @MaximVladimirsky Indeed, the appendix says `_` is disallowed. Now I'm confused. This means `_` is disallowed if the label is internationalized (as opposed to what I assumed, `_` is allowed even in hostnames if the label is internationalized). Trying with perl: Run `perl -e 'use Net::IDN::Encode ":all"; print domain_to_ascii($ARGV[0]);' -- müller.example.com` gives `xn--m14ller-xwa2143e.example.com`. With `muller.exämple.com` gives `muller.xn--exmple-jha71c.com`, with `muller.exa_mple.com` gives `muller.exa_mple.com` but `muller.exä_mple.com` gives `disallowed_STD3_valid character U+005F`. – David Tonhofer Jul 07 '17 at 11:33
51

There is one additional thing you may need to know: If the host or subdomain part of the url contain an underscore, IE9 (have not tested other versions) cannot write cookies.

So be careful about that. :-)

Kai Mattern
  • 3,090
  • 2
  • 34
  • 37
16

Clarifying bortzmeyer and David Tonhofer, domain name and subdomain name labels can contain leading underscores, but nowhere else.

As David Tonhofer wrote, labels are the in-between-the-periods parts and should follow the LDH rule except when specifying service labels and port labels to differentiate them from regular labels. Then they must occur at the beginning of the label which should be the "Short Names" from the Service Name and Port Number Registry, the port number with no leading 0s, or the protocol (ie. tcp, udp). These service labels are further limited to 15 characters.

  • RFC2782 specifies prefixing service record subdomains with underscores.
  • RFC6698 specifies prefixing port numbers with underscores in TLSA certificate records.

Contrary to David Tonhofer's answer, IDN does not allows for encoding underscore ('_' U+005F LOW LINE) or any other invalid ASCII character.

From RFC5890

[..] two new subsets of LDH labels are created by the introduction of IDNA. These are called Reserved LDH labels (R-LDH labels) and Non-Reserved LDH labels (NR-LDH labels). Reserved LDH labels, known as "tagged domain names" in some other contexts, have the property that they contain "--" in the third and fourth characters but which otherwise conform to LDH label rules.

Punycode encodes all ASCII codepoints as ASCII directly, including underscore. The resulting R-LDH would not conform the the LDH label rules. For example, Σ_.com would be encoded as xn--_-zmb.com which violates the rules. There may be a homographic codepoint which looks like an underscore that can be coded legally (perhaps '_' U+FF3F fullwidth low line), but these kinds of codepoints would be categorized as DISALLOWED by RFC5892 under 2.3 IgnorableProperties as a Noncharacter_Code_Point.

RACE (the other proposed IDN encoding scheme) was not accepted as a standard by IETF and should not be used.

Community
  • 1
  • 1
Andrew Domaszek
  • 651
  • 6
  • 18
  • 2
    Finally. Can't believe this is the only post in the whole page that even talks about punycode. – Pacerier Jan 13 '17 at 01:57
  • "domain name and subdomain name labels can contain leading underscores, but nowhere else." Not true at all. You can completely have `foo_bar TXT gotcha` in the DNS. At any level. You are focusing on `SRV` records that do have a specific syntax, but they are far from the only record types available in the DNS. – Patrick Mevzek Feb 17 '22 at 15:11
10

Recently the CAB-forum (*) decided that

All certificates containing an underscore character in any dNSName entry and having a validity period of more than 30 days MUST be revoked prior to January 15, 2019. https://cabforum.org/2018/11/12/ballot-sc-12-sunset-of-underscores-in-dnsnames/

This means that you are no longer allowed to use underscores in domains that will have a ssl/tls certificate.

(*) The Certification Authority Browser Forum (CA/Browser Forum) is a voluntary gathering of leading Certificate Issuers (as defined in Section 2.1(a)(1) and (2) below) and vendors of Internet browser software and other applications that use certificates (Certificate Consumers, as defined in Section 2.1(a)(3) below).

Cie6ohpa
  • 815
  • 1
  • 10
  • 13
  • They weren't allowed before. That was a grace period after some CAs were caught violating the rules and wanted time to transition their customers. – Matt Nordhoff May 25 '21 at 15:24
6

I followed the link to RFC1034 and read most of it and was surprised to see this:

The labels must follow the rules for ARPANET host names. They must start with a letter, end with a letter or digit, and have as interior characters only letters, digits, and hyphen. There are also some restrictions on the length. Labels must be 63 characters or less.

For clarification, a domain names are made up of labels which are separated by dots ".". This spec must be outdated because it doesn't mention the use of underscores. I can understand the confusion if anybody stumbles over this spec without knowing it is obsolete. It is obsolete, isn't it?

I followed the link to RFC2181 and read some of it. Especially where it pertains to the issue of what is an authoritative, or canonical, name and the issue of what makes a valid DNS label.

As posted earlier it states there's only a length restriction then to sum it up it reads:

(about names and valid labels)

These are already adequately specified, however the specifications seem to be sometimes ignored. We seek to reinforce the existing specifications.

Kind of leaves me wondering if "a length only restriction" is "adequate". Are we going to start seeing domain names like @#$%!! soon? Isn't the internet screwed up enough?

Ted Cambron
  • 221
  • 3
  • 3
  • 3
    No, it is not obsolete. RFC1034 is a specification about *host names*, a special case of *domain names*, which are generic identifiers of resources in the DNS database. For example, the "host" part of URIs is defined rather relaxedly (http://tools.ietf.org/html/rfc3986#section-3.2.2) but the RFC cautions: "A host identified by a registered name is a sequence of characters usually intended for lookup within a locally defined host or service name registry ... a registered name intended for lookup in the DNS uses the syntax defined in Section 3.5 of [RFC1034] and Section 2.1 of [RFC1123]." – David Tonhofer Jan 30 '13 at 15:31
5

As of 2023, there are websites appearing on Google search whose subdomains contain underscores, like https://my_sarisari_store.typepad.com

2

Here my 2 cents from Java world:

From a Spark Scala console, with Java 8:

scala> new java.net.URI("spark://spark_master").getHost
res10: String = null

scala> new java.net.URI("spark://spark-master").getHost
res11: String = spark-master

scala> new java.net.URI("spark://spark_master.google.fr").getHost
res12: String = null

scala> new java.net.URI("spark://spark.master.google.fr").getHost
res13: String = spark.master.google.fr

scala> new java.net.URI("spark://spark-master.google.fr:3434").getHost
res14: String = spark-master.google.fr

scala> new java.net.URI("spark://spark-master.goo_gle.fr:3434").getHost
res15: String = null

It's definitely a bad idea ^^

Tripp Kinetics
  • 5,178
  • 2
  • 23
  • 37
Thomas Decaux
  • 21,738
  • 2
  • 113
  • 124
2

Individual TLD's can place their own rules & restrictions on domains names as they see fit, such as to accomodate local languages.

For example, according to the CIRA, Canada's .ca domain names are allowed:

  • Letters a through z, and the following accented characters: é ë ê è â à æ ô œ ù û ü ç î ï ÿ. Note that Domain Names are not case sensitive. This means there will be no distinction made between upper case letters and lower case letters (A = a);

  • The numbers 0123456789, and

  • The hyphen character ("-) (although it cannot be used to start or end a Domain Name).

The maximum length is 63 characters, except each accented character reduces that limit by 4 characters.

(Source)


Incidentally, this allows for around 4 Quadragintillion domain name possibilities (not counting sub-domains) for dot-ca domains.

ashleedawg
  • 20,365
  • 9
  • 72
  • 105
2

Regardless of the host name vs domain name discussion, it is definately a very bad idea to use underscores in the host part of a url. It will cause you grief. It may well work in a browser, but in one case I ran into recently an app refused to make a tls connection with a perfectly valid wildcard certificate for *.s3. amazonaws.com because the wildcard host name part had an underscore in it and would not validate. I believe the underlying library used openssl.

user1572039
  • 181
  • 2
  • 4
-2

Just created local project (with vagrant) and it was working perfectly when accessed over ip address. Then I added some_name.test to hosts file and tried accessing it that way, but I was getting "bad request - 400" all the time. Wasted hours until I figured out that just changing domain name to some-name.test solves the problem. So at least locally on Mac OS it's not working.

MilanG
  • 6,994
  • 2
  • 35
  • 64
-2

No, you can't use underscore in subdomain but hypen (dash). i.e my-subdomain.agahost.com is acceptable and my_subdomain.agahost.com would not be acceptable.

  • Somebody should tell this Microsoft. IIS allows underscores in subdomains and then one wonders as a layman why there are problems on some systems ... – The incredible Jan Sep 13 '21 at 13:49