44

Does anyone know how to write a regexp that only allows a-zA-Z0-9.- (letters, numbers, dots, and dash) BUT that never starts or ends with a dot or dash ?

I tried this one :

/^[^.-][a-zA-Z0-9.-]+[^.-]$/

... but if I write something like "john@", it works, and I don't want to because @ is not allowed.

Prince John Wesley
  • 62,492
  • 12
  • 87
  • 94
user1018527
  • 471
  • 1
  • 4
  • 5

11 Answers11

107

Subdomain

According to the pertinent internet recommendations (RFC3986 section 2.2, which in turn refers to: RFC1034 section 3.5 and RFC1123 section 2.1), a subdomain (which is a part of a DNS domain host name), must meet several requirements:

  • Each subdomain part must have a length no greater than 63.
  • Each subdomain part must begin and end with an alpha-numeric (i.e. letters [A-Za-z] or digits [0-9]).
  • Each subdomain part may contain hyphens (dashes), but may not begin or end with a hyphen.

Here is an expression fragment for a subdomain part which meets these requirements:

[A-Za-z0-9](?:[A-Za-z0-9\-]{0,61}[A-Za-z0-9])?

Note that this expression fragment should not be used alone - it requires the incorporation of boundary conditions in a larger context, as demonstrated in the following expression for a DNS host name...

DNS host name

A named host, (not an IP address), must meet additional requirements:

  • The host name may consist of multiple subdomain parts, each separated by a single dot.
  • The length of the overall host name should not exceed 255 characters.
  • The top level domain, (the rightmost part of the DNS host name), must be one of the internationally recognized values. The list of valid top level domains is maintained by IANA.ORG. (See the bare-bones current list here: http://data.iana.org/TLD/tlds-alpha-by-domain.txt).

With this is mind, here a commented regex (in PHP syntax), which will pseudo-validate a DNS host name: (Note that this incorporates a modified version of the above expression for a subdomain and adds comments to this as well).

Update 2016-08-20: Since this answer was originally posted back in 2011, the number of top-level domains has exploded. As of August 2016 there are now more than 1400. The original regex to this answer incorporated all of these but this is no loger practical. The new regex below incorporates a different expression for the top-level domain. The algorithm comes from: Top Level Domain Name Specification draft-liman-tld-names-06.

$DNS_named_host = '%(?#!php/i DNS_named_host Rev:20160820_0800)
    # Match DNS named host domain having one or more subdomains.
    # See: http://stackoverflow.com/a/7933253/433790
    ^                     # Anchor to start of string.
    (?!.{256})            # Whole domain must be 255 or less.
    (?:                   # One or more sub-domains.
      [a-z0-9]            # Subdomain begins with alpha-num.
      (?:                 # Optionally more than one char.
        [a-z0-9-]{0,61}   # Middle part may have dashes.
        [a-z0-9]          # Starts and ends with alpha-num.
      )?                  # Subdomain length from 1 to 63.
      \.                  # Required dot separates subdomains.
    )+                    # End one or more sub-domains.
    (?:                   # Top level domain (length from 1 to 63).
      [a-z]{1,63}         # Either traditional-tld-label = 1*63(ALPHA).
    | xn--[a-z0-9]{1,59}  # Or an idn-label = Restricted-A-Label.
    )                     # End top level domain.
    $                     # Anchor to end of string.
    %xi';  // End $DNS_named_host.

Note that this expression is not perfect. It requires one or more subdomains, but technically, a host can consist of a TLD having no subdomain (but this is rare).

Update 2014-08-12: Added simplified expression for subdomain which does not require alternation.

Update 2016-08-20: Modified DNS host name regex to (more generally) match the new vast number of valid top level domains. Also, trimmed out unnecessary material from answer.

Community
  • 1
  • 1
ridgerunner
  • 33,777
  • 5
  • 57
  • 69
  • 1
    Hmm, I think a double '--' is also not valid but possible with this regex, right? – algorhythm Nov 26 '13 at 14:43
  • 3
    @algorhythm - My interpretation of the RFCs is that a double hyphen is perfectly valid, but each subdomain part may not begin or end with a hyphen. – ridgerunner Nov 26 '13 at 16:11
  • Note that anno 2016, there are many more allowed TLDs than the provided DNS hostname regex allows. – Qqwy Mar 02 '16 at 21:46
  • @Qqwy - Yes, you are absolutely correct. When I get some time I'll update the answer to reflect this. Thanks for the comment! – ridgerunner Mar 03 '16 at 02:08
  • Finally found some time to fix this one up a bit. – ridgerunner Aug 20 '16 at 23:59
  • 2
    This is good rough validation but 1. [underscores are perfectly legal](https://stackoverflow.com/a/2183140/893918) so `^\w(?:[\w-]{0,61}\w)?$` for the subdomain parts works very well, in fact [srv records require them](https://www.ietf.org/rfc/rfc2782.txt) to avoid collisions with normal subdomains 2. fyi double hyphens are [required for punycode](https://www.ietf.org/rfc/rfc3490.txt) to work. You can of course restrict those validations to certain record types, but you'll have to write a small parser for that or something, which would also allow you to check against a current tld list :) – sg3s Jul 03 '17 at 08:30
  • What do I do with a domain name such as איגוד-האינטרנט.org.il? According to whois this domain name maps to xn----zhcbgfhe2aacg8fb5i.org.il, but that's not the name my users are going to enter. They are going to enter איגוד-האינטרנט.org.il – HairOfTheDog Feb 10 '21 at 19:18
  • Be aware! Subdomains shouldn't start with dash, yet this regex returns true with "-a" – Ricardo Yubal Dec 03 '21 at 02:34
15

You want the first and last characters limited to alphanumeric. What you have now allows the first and last characters to be anything other than dot and dash. This fits the description:

/^[a-zA-Z0-9][a-zA-Z0-9.-]+[a-zA-Z0-9]$/
David Alber
  • 17,624
  • 6
  • 65
  • 71
5

Here is DOMAIN + SUBDOMAIN solution that may help to someone else:

   /^([a-zA-Z0-9]([-a-zA-Z0-9]{0,61}[a-zA-Z0-9])?\.)?([a-zA-Z0-9]{1,2}([-a-zA-Z0-9]{0,252}[a-zA-Z0-9])?)\.([a-zA-Z]{2,63})$/

which passes following chai tests:

const expect = require('chai').expect;

function testDomainValidNamesRegExp(val) {
    let names = /^([a-zA-Z0-9]([-a-zA-Z0-9]{0,61}[a-zA-Z0-9])?\.)?([a-zA-Z0-9]([-a-zA-Z0-9]{0,252}[a-zA-Z0-9])?)\.([a-zA-Z]{2,63})$/;
    return names.test(val);
} 

let validDomainNames = [
    "example.com",
    "try.direct",
    "my-example.com",
    "subdomain.example.com",
    "example.com",
    "example23.com",
    "regexp-1222.org",
    "read-book.net",
    "org.host.org",
    "org.host.org",
    "velmart.shop-products.md",
    "ip2email.terronosp-222.lb",
    "stack.com",
    "sta-ck.com",
    "sta---ck.com",
    "9sta--ck.com",
    "sta--ck9.com",
    "stack99.com",
    "99stack.com",
    "sta99ck.com",
    "sub.do.com",
    "ss.sss-ss.ss",
    "s.sss-ss.ss",
    "s.s-s.ss",
    "test.t.te"
    ];

let invalidDomainNames = [
     "example2.com222",
     "@example.ru:?",
     "example22:89",
     "@jefe@dd.ru@22-",
     "example.net?1222",
     "example.com:8080:",
     ".example.com:8080:",
     "---test.com",
     "$dollars$.gb",
     "sell-.me",
     "open22.the-door@koll.ru",
     "mem-.wer().or%:222",
     "pop().addjocker.lon",
     "regular-l=.heroes?",
     " ecmas cript-8.org ",
     "example.com::%",
     "example:8080",
     "example",
     "examaple.com:*",
    "-test.test.com",
    "-test.com",
    "dd-.test.com",
    "dfgdfg.dfgdf33.e",
    "dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd-.test.com",
    "dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd.testttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt.com",
    "d-.test.com"
];

describe("Test Domain Valid Names RegExp", () => {
    validDomainNames.forEach((val) => {
        it(`Text: ${val}`, () => {
            expect(testDomainValidNamesRegExp(val)).to.be.true;
        });
    });
});

describe("Test Domain Invalid Names RegExp", () => {
    invalidDomainNames.forEach((val) => {
        it(`Text: ${val}`, () => {
            expect(testDomainValidNamesRegExp(val)).to.be.false;
        });
    });
});

More tests are very welcome !

Vasili Pascal
  • 3,102
  • 1
  • 27
  • 21
  • This worked really well for me. FYI I added the following to the end, in order that it would pick up port numbers if present (a requirement in my case as we are using them locally): (:[0-9]{0,4})? – TPHughes Jul 13 '20 at 08:42
  • How can I validate submain only. like enter text only be a subdomain – Ameer Hamza Dec 28 '22 at 17:49
4

In our project, we match subdomains like this

Client JS

^([A-Za-z0-9](?:(?:[-A-Za-z0-9]){0,61}[A-Za-z0-9])?(?:\.[A-Za-z0-9](?:(?:[-A-Za-z0-9]){0,61}[A-Za-z0-9])?){2,})$

Server Ruby

\A([A-Za-z0-9](?:(?:[-A-Za-z0-9]){0,61}[A-Za-z0-9])?(?:\.[A-Za-z0-9](?:(?:[-A-Za-z0-9]){0,61}[A-Za-z0-9])?){2,})\z
Daniel Rosenberg
  • 600
  • 6
  • 11
1

Here is regexp for sub-domain which

  • Allow dot(.), underscore(_), dash(-) within string
  • Not Allow dot(.), underscore(_), dash(-) in first and last character
  • Allow alphanumeric in string

    ^[a-zA-Z0-9]+[a-zA-Z0-9-._]*[a-zA-Z0-9]+$

Correct Example

  • abc.com
  • abc_xyz.com
  • abc.xyz.com
  • abc

Incorrect Example

  • abc.
  • -abc
  • abc-
  • xyz.abc-
  • https://abcxyz.com
coDe murDerer
  • 1,858
  • 4
  • 20
  • 28
1

Try this one:

/^[a-zA-Z0-9][a-zA-Z0-9.-]*[a-zA-Z0-9]$/

BUT the string has to be at least 2 characters long to match: a a-zA-Z0-9 and a a-zA-Z0-9. To avoid this, you can use this regex:

/^[a-zA-Z0-9][a-zA-Z0-9.-]*$/

But you have to do an extra check to ensure, that the end of the string is neither a dot nor a dash.

ckruse
  • 9,642
  • 1
  • 25
  • 25
1

Try this reg-exp /^[a-zA-Z0-9][a-zA-Z0-9.-]*[a-zA-Z0-9]$/ The problem with your code was [^.-] at the starting and ending matches whatever character excpet '.' or '-' that matches all characters and not necessarily [a-zA-Z0-9]

Sreenath Nannat
  • 1,949
  • 1
  • 13
  • 18
0

Try this if you want dashes but with no dots in the subdomain: /^\w[\w-]+\w$/

Elad Amsalem
  • 1,490
  • 1
  • 12
  • 14
0

i was searching for regex but i just needed to check the origin to be of the same domain so just doing this worked. origin.includes('website.com')

sonu sharma
  • 39
  • 1
  • 3
0

You may try this for subdomains:

(^[a-zA-Z0-9][a-zA-Z0-9]*)+(([.][a-zA-Z0-9]+)*([-]+[a-zA-Z0-9]+)*([_]+[a-zA-Z0-9]+)*)*$

Explanation:

(^[a-zA-Z0-9][a-zA-Z0-9]*)+

Starts with alphanumeric character following 0-unlimited alphanumeric characters, at least one time.

([.][a-zA-Z0-9]+)*

optional: One dot followed by 0 or more alphanumeric characters.

([-]+[a-zA-Z0-9]+)*

optional: one or more "-" followed by 0 or more alphanumeric characters.

([_]+[a-zA-Z0-9]+)*

optional: one or more "_" followed by 0 or more alphanumeric characters.

EspressoCode
  • 287
  • 3
  • 8
0

Try this regex:

^(?![-.])[a-zA-Z0-9.-]+(?<![-.])$
Prince John Wesley
  • 62,492
  • 12
  • 87
  • 94