30

I'm looking for the regex to validate hostnames. It must completely conform to the standard. Right now, I have

^[0-9a-z]([0-9a-z\-]{0,61}[0-9a-z])?(\.[0-9a-z](0-9a-z\-]{0,61}[0-9a-z])?)*$

but it allows successive hypens and hostnames longer than 255 characters. If the perfect regex is impossible, say so.

Edit/Clarification: a Google search didn't reveal that this is a solved (or proven unsolvable) problem. I want to to create the definitive regex so that nobody has to write his own ever. If dialects matter, I want a a version for each one in which this can be done.

CannibalSmith
  • 4,742
  • 10
  • 44
  • 52

7 Answers7

31

^(?=.{1,255}$)[0-9A-Za-z](?:(?:[0-9A-Za-z]|-){0,61}[0-9A-Za-z])?(?:\.[0-9A-Za-z](?:(?:[0-9A-Za-z]|-){0,61}[0-9A-Za-z])?)*\.?$

Prof. Falken
  • 24,226
  • 19
  • 100
  • 173
CannibalSmith
  • 4,742
  • 10
  • 44
  • 52
  • 2
    It doesn't accept Domains with trailing "." but otherwise, works. – nicerobot Sep 14 '09 at 13:52
  • 1
    Fixed. I wonder if the length assertion should check if it's 254 or less excluding the trailing dot instead of just checking if it's 255 or less. Otherwise someone along the line might add the trailing dot to a maximum length hostname and break it. – CannibalSmith Sep 15 '09 at 09:48
  • 1
    The \b before the hyphen is preventing this from matching valid Internationalized Domain Names, e.g. xn--bcher-kva.ch. – Jordan Rieger Nov 21 '12 at 23:14
  • 11
    I know it's just semantics, but this regex validates a FQDN, *not* a hostname. – Jason Antman Jul 03 '13 at 15:36
  • 3
    This matches a name with digits only which is invalid (see RFC 1912: `Labels may not be all numbers, but may have a leading digit`) – looper Jul 22 '14 at 08:19
  • This answer validates invalid hostnames containing multiple dots (example..com). See my more correct answer below: http://stackoverflow.com/a/18494710/2355587 – derekm Mar 12 '17 at 01:43
  • This is a great answer, but here's a shorter one that, with several unit tests I put together at regex101.com, performs even better: https://stackoverflow.com/a/20204811/1982136 – Tim Malone Oct 11 '20 at 02:59
  • @looper [RFC1123](https://tools.ietf.org/html/rfc1123#page-13) allows fully numeric labels – milahu Dec 31 '20 at 08:38
  • The length checking isn't quite right. It should start with ^(?=.{1,253}\.?$) instead. – jschultz410 Feb 07 '22 at 19:59
14

The approved answer validates invalid hostnames containing multiple dots (example..com). Here is a regex I came up with that I think exactly matches what is allowable under RFC requirements (minus an ending "." supported by some resolvers to short-circuit relative naming and force FQDN resolution).

Spec:

<hname> ::= <name>*["."<name>]
<name> ::= <letter-or-digit>[*[<letter-or-digit-or-hyphen>]<letter-or-digit>]

Regex:

^([a-zA-Z0-9](?:(?:[a-zA-Z0-9-]*|(?<!-)\.(?![-.]))*[a-zA-Z0-9]+)?)$

I've tested quite a few permutations myself, I think it is accurate.

This regex also does not do length validation. Length constraints on labels betweens dots and on names are required by RFC, but lengths can easily be checked as second and third passes after validating against this regex, by checking full string length, and by splitting on "." and validating all substrings lengths. E.g., in JavaScript, label length validation might look like: "example.com".split(".").reduce(function (prev, curr) { return prev && curr.length <= 63; }, true).


Alternative Regex (without negative lookbehind, courtesy of the HTML Living Standard):

^[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$
Community
  • 1
  • 1
derekm
  • 429
  • 1
  • 4
  • 12
  • 3
    I could not use a negative lookbehind (thanks JS) so I came up with this which is very similar: `^([a-zA-Z0-9]+(-[a-zA-Z0-9]+)*)+(\.([a-zA-Z0-9]+(-[a-zA-Z0-9]+)*))*$` - again it does not check for length but it *does* validate no leading/ trailing/ repeating `-` or `.`. Works on bare hostnames or FQDNs. – thom_nic Jun 15 '17 at 15:11
4

Your answer was relatively close.

But see

For a hostname RE, that perl module produces

(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z])[.]?)

I would modify to be more accurate as:

(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]{0,61})?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]{0,61}[a-zA-Z0-9]|[a-zA-Z])[.]?)

Optionally anchoring the ends with ^$ to ONLY match hostnames.

I don't think a single RE can accomplish an full validation because, according to Wikipedia, there is a 255 character length restriction which i don't think can be included within that same RE, at least not without a ton of changes, but it's easy enough to just check the length <= 255 before running the RE.

Community
  • 1
  • 1
nicerobot
  • 9,145
  • 6
  • 42
  • 44
2

I tried all answers with these examples below and unfortunately no one has passed the test.

ec2-11-111-222-333.cd-blahblah-1.compute.amazonaws.com
domaine.com
subdomain.domain.com
12533d5.dkkkd.com
2dotsextension.co
1dotextension.c
ekkej_dhh.com
12552.2225
112.25.25
12345.com
12345.123.com
domaine.123
whatever
9999-ee.99
email@domain.com
.jjdj.kkd
-subdomain.domain.com
@subdomain.domain.com
112.25.25

Here is a better solution.

^[A-Za-z0-9][A-Za-z0-9-.]*\.\D{2,4}$

Just please post any other not considered case if exists @ https://regex101.com/r/89zZkW/1

KooliMed
  • 47
  • 4
1

Take a look at the following question. A few of the answers have regex expressions for host names

Could you specify what language you want to use this regex in? Most languages / systems have slightly different regex implementations that will affect people's answers.

Community
  • 1
  • 1
JaredPar
  • 733,204
  • 149
  • 1,241
  • 1,454
  • 1
    I'm using .NET, but I want the regex to be as portable as possible so that other people can use it too. – CannibalSmith Sep 13 '09 at 18:16
  • So long as you maintain your Regex you'll find your earned progress stays extremely portable betwixt environments. – Hardryv Nov 11 '11 at 19:53
0

What about:

^(?=.{1,255})([0-9A-Za-z]|_{1}|\*{1}$)(?:(?:[0-9A-Za-z]|\b-){0,61}[0-9A-Za-z])?(?:\.[0-9A-Za-z](?:(?:[0-9A-Za-z]|\b-){0,61}[0-9A-Za-z])?)*\.?$

for matching only one '_' (for some SRV) at the beginning and only one * (in case of a label for a DNs wildcard)

nbari
  • 25,603
  • 10
  • 76
  • 131
0

According to the relevant internet RFCs and assuming you have lookahead and lookbehind positive and negative assertions:

If you want to validate a local/leaf hostname for use in an internet hostname (e.g. - FQDN), then:

^(?!-)[-a-zA-Z0-9]{1,63}(?<!-)$

That ^^^ is also the general check that a label component inside an internet hostname is valid.

If you want to validate an internet hostname (e.g. - FQDN), then:

^(?=.{1,253}\.?$)(?:(?!-)[-a-zA-Z0-9]{1,63}(?<!-)\.)*(?!-)[-a-zA-Z0-9]{1,63}(?<!-)\.?$
jschultz410
  • 2,849
  • 14
  • 22