Following up to Regular expression to match hostname or IP Address? and using Restrictions on valid host names as a reference, what is the most readable, concise way to match/validate a hostname/fqdn (fully qualified domain name) in Python? I've answered with my attempt below, improvements welcome.
9 Answers
import re
def is_valid_hostname(hostname):
if len(hostname) > 255:
return False
if hostname[-1] == ".":
hostname = hostname[:-1] # strip exactly one dot from the right, if present
allowed = re.compile("(?!-)[A-Z\d-]{1,63}(?<!-)$", re.IGNORECASE)
return all(allowed.match(x) for x in hostname.split("."))
ensures that each segment
- contains at least one character and a maximum of 63 characters
- consists only of allowed characters
- doesn't begin or end with a hyphen.
It also avoids double negatives (not disallowed
), and if hostname
ends in a .
, that's OK, too. It will (and should) fail if hostname
ends in more than one dot.

- 1,231
- 15
- 31

- 328,213
- 58
- 503
- 561
-
1Hostname labels should also not end with a hyphen. – bobince Mar 28 '10 at 11:09
-
1You're using `re.match` incorrectly - mind that `re.match("a+", "ab")` is a match whereas `re.match("a+$", "ab")` isn't. Your function also does not allow for a single dot at the end of the hostname. – AndiDog Mar 28 '10 at 12:02
-
1I had been under the impression that `re.match` needs to match the entire string, therefore making the end-of-string anchor unnecessary. But as I now found out (thanks!) it only binds the match to the start of the string. I corrected my regex accordingly. I don't get your second point, however. Is it legal to end a hostname in a dot? The Wikipedia article linked in the question appears to say no. – Tim Pietzcker Mar 28 '10 at 12:22
-
6@Tim Pietzcker Yes, a single dot at the end is legal. It marks the name as a fully-qualified domain name, which lets the DNS system know that it shouldn't try appending the local domain to it. – Daniel Stutzbach Mar 28 '10 at 13:16
-
Note that there's also a 63 character limit for each segment. And a global 255 character for the whole hostname. – Romuald Brunet Mar 28 '10 at 13:19
-
Just wondering, why isn't it named isValidFQDN(fqdn) ? There seems to be some grumbling that "domain name" is in the term, but hostname can have other connotations as well, and this is obviously tuned the the FQDN designation. (Great little module, BTW, I appreciate the effort!) – Jiminion Jul 25 '13 at 19:12
-
@Jim: Oh, I don't know the RFCs well enough for this. Feel free to edit the post to rename the function; if the community deems that a valid edit, it will be approved - if not, there is no harm done. – Tim Pietzcker Jul 25 '13 at 19:17
-
Nitpick: This will raise an exception when passed an empty string `''`. For robustness, probably want to avoid that with an additional check. – ron rothman Feb 13 '14 at 19:24
-
Works good. Only exception I've come across is when there's a line break in the middle of the domain, it still passes. It will happen, for those of you who are also checking millions of domains. – User Aug 27 '14 at 00:51
-
Another nitpick: this will return True for a hostname ending in more than two dots. edited the function accordingly – andreas-h Mar 30 '16 at 08:04
-
@andreas-h: No, it doesn't. If there are two or more dots at the end, only the last one will be removed; then the `split()` will result in one or more empty strings, and empty strings fail the regex because of the `{1,63}` quantifier. – Tim Pietzcker Mar 30 '16 at 10:15
-
Regardless of RFC 1035, a label in the hostname can start and end with a hyphen perfectly fine. For example `dig @8.8.8.8 "www-.---example-.kintis.net"`. There are registrars that even allow domain registration with hyphens in the beginning or the end! – Panagiotis Oct 14 '16 at 14:21
-
Underscore is also part of valid chracter http://stackoverflow.com/questions/2180465/can-domain-name-subdomains-have-an-underscore-in-it – mootmoot Apr 04 '17 at 14:53
-
@mootmoot: Isn't that about domain names as opposed to host names? – Tim Pietzcker Apr 04 '17 at 15:28
-
At the moment, underscore is valid for both domain name and hostname. Except the domain registrar make the underscore unapplicable during registration. In addition, IDN domain name is going to break your code. So I just add an answer to add extra step to convert hostname to punycode before feed into your program. – mootmoot Apr 04 '17 at 15:53
-
Using this reges working fine in my case ```. regex = "^((?!-)[A-Za-z0-9-]{1,63}(?<!-)\\.)+[A-Za-z]{2,6}$" ``` Ref : https://www.geeksforgeeks.org/how-to-validate-a-domain-name-using-regular-expression/ – Sanjeev singh May 16 '20 at 18:48
-
this implements [RFC1123](https://tools.ietf.org/html/rfc1123#page-13) – milahu Dec 31 '20 at 08:39
Here's a bit stricter version of Tim Pietzcker's answer with the following improvements:
- Limit the length of the hostname to 253 characters (after stripping the optional trailing dot).
- Limit the character set to ASCII (i.e. use
[0-9]
instead of\d
). - Check that the TLD is not all-numeric.
import re
def is_valid_hostname(hostname):
if hostname[-1] == ".":
# strip exactly one dot from the right, if present
hostname = hostname[:-1]
if len(hostname) > 253:
return False
labels = hostname.split(".")
# the TLD must be not all-numeric
if re.match(r"[0-9]+$", labels[-1]):
return False
allowed = re.compile(r"(?!-)[a-z0-9-]{1,63}(?<!-)$", re.IGNORECASE)
return all(allowed.match(label) for label in labels)

- 1
- 1

- 142,882
- 41
- 325
- 378
-
2According to RFC 3936 ( https://tools.ietf.org/html/rfc3696#section-2 ), only TLD should not be numeric, so I'd say that the last condition should look like `if re.match(r"\.(\d+)$", hostname):` – Minras Nov 28 '16 at 14:43
-
This assumes a TLD is just the last label in the FQDN. That is not correct. There are man TLDs that are made up of 2 or 3 labels (i.e. co.uk). Here's a good source for public suffixes: https://github.com/publicsuffix/list/blob/master/public_suffix_list.dat – bbak May 02 '19 at 16:23
-
1@bbak Did you just randomly downvote my answer? This is still a stricter validation than suggested in other answers here, including the [accepted](https://stackoverflow.com/a/2532344/244297) one. – Eugene Yarmash May 03 '19 at 07:07
Per The Old New Thing, the maximum length of a DNS name is 253 characters. (One is allowed up to 255 octets, but 2 of those are consumed by the encoding.)
import re
def validate_fqdn(dn):
if dn.endswith('.'):
dn = dn[:-1]
if len(dn) < 1 or len(dn) > 253:
return False
ldh_re = re.compile('^[a-z0-9]([a-z0-9-]{0,61}[a-z0-9])?$',
re.IGNORECASE)
return all(ldh_re.match(x) for x in dn.split('.'))
One could argue for accepting empty domain names, or not, depending on one's purpose.

- 1,631
- 16
- 26
I like the thoroughness of Tim Pietzcker's answer, but I prefer to offload some of the logic from regular expressions for readability. Honestly, I had to look up the meaning of those (?
"extension notation" parts. Additionally, I feel the "double-negative" approach is more obvious in that it limits the responsibility of the regular expression to just finding any invalid character. I do like that re.IGNORECASE allows the regex to be shortened.
So here's another shot; it's longer but it reads kind of like prose. I suppose "readable" is somewhat at odds with "concise". I believe all of the validation constraints mentioned in the thread so far are covered:
def isValidHostname(hostname):
if len(hostname) > 255:
return False
if hostname.endswith("."): # A single trailing dot is legal
hostname = hostname[:-1] # strip exactly one dot from the right, if present
disallowed = re.compile("[^A-Z\d-]", re.IGNORECASE)
return all( # Split by labels and verify individually
(label and len(label) <= 63 # length is within proper range
and not label.startswith("-") and not label.endswith("-") # no bordering hyphens
and not disallowed.search(label)) # contains only legal characters
for label in hostname.split("."))

- 6,222
- 4
- 40
- 51
-
You don't need the backslashes as line continuators - they are implicit in the enclosing parentheses. – Tim Pietzcker Mar 29 '10 at 06:09
-
This returns `True` for "1.1.1.1" (and any other all-numeric hostname). – Eugene Yarmash Oct 19 '15 at 13:42
Complimentary to the @TimPietzcker answer. Underscore is a valid hostname character (but not for domain name) . While double dash is commonly found for IDN punycode domain(e.g. xn--). Port number should be stripped. This is the cleanup of the code.
import re
def is_valid_hostname(hostname):
if len(hostname) > 255:
return False
hostname = hostname.rstrip(".")
allowed = re.compile("(?!-)[A-Z\d\-\_]{1,63}(?<!-)$", re.IGNORECASE)
return all(allowed.match(x) for x in hostname.split("."))
# convert your unicode hostname to punycode (python 3 )
# Remove the port number from hostname
normalise_host = hostname.encode("idna").decode().split(":")[0]
is_valid_hostname(normalise_host )

- 3,151
- 1
- 33
- 39

- 12,845
- 5
- 47
- 44
-
1Underscore is NOT valid - see https://stackoverflow.com/a/2183140/500902 and its linked IETF specs. Many things work fine when an _ is in a hostname, but various correctly conforming software may fail, when, e.g. parsing a URL where the hostname has an '_' – Marvin Sep 13 '19 at 14:54
-
1Thanks for pointing out `encode("idna")`, though. If it wasn't for the `:` I'd upvote - the original question didn't mention anything about that. – HumbleEngineer Feb 09 '21 at 10:49
def is_valid_host(host):
'''IDN compatible domain validator'''
host = host.encode('idna').lower()
if not hasattr(is_valid_host, '_re'):
import re
is_valid_host._re = re.compile(r'^([0-9a-z][-\w]*[0-9a-z]\.)+[a-z0-9\-]{2,15}$')
return bool(is_valid_host._re.match(host))

- 1,620
- 1
- 19
- 32
-
4What's this, obfuscated python? Why the magic with maybe making a regexp an attribute on the function? – kaleissin Mar 21 '13 at 11:13
-
2It's so the re.compile only has to be done once, instead of every time the function is called. Probably only matters if you're calling this function many times per second. – btubbs Nov 19 '15 at 19:06
-
2
-
2@btubbs Python already [caches results of `re.compile()`](https://docs.python.org/3/library/re.html?highlight=re#re.compile), unless you use are using a lot of different patterns. I agree with @kaleissin that this code is rather obfuscated. Just plug the compiled expression in a private variable in the same scope as `is_valid_host()`. You can still only calculate it on the first call and save it into the variable. – Feuermurmel Jul 11 '16 at 16:39
I think this regex might help in Python: '^([a-zA-Z0-9]+(\.|\-))*[a-zA-Z0-9]+$'

- 9
-
1
-
@MofX while this regex is inaccurate, it highlights a particular point others have usually ignored; a hostname part may include some glyphs in the middle but not at the beginning or end. specifically, hostnames must not start with a digit or hyphen, and must not end with a hyphen. and, while not always enforced, the _ is found rather commonly in hostnames when it should only be used to prefix a domain key or service record. it is still valid, however, to attempt to resolve a hostname that consists entirely of an _. i.e. _.example.com is legitimate and applies for RFC 7816. – FirefighterBlu3 Dec 31 '19 at 17:39
-
DNS does in fact allow labels to start with and even consist entirely of numbers (although it originally didn't; anecdotally it was changed because 411.com or a similar service requested it). – tripleee Oct 21 '21 at 06:06
Process each DNS label individually by excluding invalid characters and ensuring nonzero length.
def isValidHostname(hostname):
disallowed = re.compile("[^a-zA-Z\d\-]")
return all(map(lambda x: len(x) and not disallowed.search(x), hostname.split(".")))

- 142,882
- 41
- 325
- 378

- 6,222
- 4
- 40
- 51
-
2
-
1A trailing `.` on the end of a hostname is valid. Oh, and much more work to do if you want to support IDN, of course... – bobince Mar 28 '10 at 11:10
If you're looking to validate the name of an existing host, the best way is to try to resolve it. You'll never write a regular expression to provide that level of validation.

- 133,037
- 18
- 149
- 215
-
6And what if he wants to find out if a hostname that does not yet exist will be a legal one? The RFC appears to be quite straightforward, so I don't see why a regex wouldn't work. – Tim Pietzcker Mar 28 '10 at 12:25
-
Depends on what you're trying to show. If the name doesn't resolve then who knows what it “means”; the true means of validation require information that a regular expression cannot have (i.e., access to DNS). It's easier to just try it and handle the failure. And when thinking about names that are potentially legal but not yet, the only people who actually need to care about that are the registrars. Everyone else should leave these things to the code that is designed to have genuine expertise in the area. As JWZ notes, applying an RE turns a problem into two problems. (Well, mostly…) – Donal Fellows Mar 28 '10 at 14:01
-
i do not agree. there are two separate concerns, and both are valid concerns: (1)°argue whether a given string can serve, technically and plausibly, as a, say, valid email address, hostname, such things; (2)°demonstrate that a given name is taken, or likely free. (1) is purely a syntactical consideration. since (2) happens over the network, there is a modicum of doubt: a host that is up now can be down in a second, a domain i order now can be taken when my mail arrives. – flow Mar 28 '10 at 15:37
-
1This approach has been proposed in a similar question (http://stackoverflow.com/questions/399932/can-i-improve-this-regex-check-for-valid-domain-names/401132#401132), and there is even a Python project to facilitate this (http://code.google.com/p/python-public-suffix-list/). I've modified the question title slightly, since I'm not interested in a solution that requires network lookups. – kostmo Mar 28 '10 at 20:29