javascript regex that gets all subdomains

Question

I have the following RegEx:

[!?\.](.*)\.example\.com

and this sample string:

test foo abc.def.example.com bar ghi.jkl.example.com def

I want that the RegEx products the following matches: def.example.com and jkl.example.com. What do I have to change? Should be working on all subdomains of example.com. If possible it should only take the first subdomain-level (abc.def.example.com -> def.example.com).

Tested it on regexpal, not fully working :(

I think you meant `(?<!\.)` instead of `[!?\.]`. `(?<!)` is a negative lookbehind, unfortunately it isn't supported in Javascript. `[!?\.]` will match `!` or `?` or `.`, basically it's the same as `(?:!|\?|\.)`. — HamZa, Jul 15 '13 at 14:50

score 10 · Answer 1 · answered Jul 15 '13 at 14:38

10

You may use the following expression : [^.\s]+\.example\.com.

Explanation

[^.\s]+ : match anything except a dot or whitespace one or more times
\.example\.com : match example.com

Note that you don't need to escape a dot in a character class

answered Jul 15 '13 at 14:38

HamZa

14,671
11
54
75

Awesome, thanks! How can I get all matches of this regex in a string via JavaScript? `str = 'test abc.def.example.com and ghi.jkl.example.com usw.'; str.match('[^.\s]+\.example\.com');` shows me a single match... – fnkr Jul 15 '13 at 14:45
2

@fnkr add a `g` flag (for global): `str.match(/[^.\s]+\.example\.com/g)` => no quotes, but slashes and a `g` outside the regex delimiting `/` [same rules apply to replacing substrings](http://stackoverflow.com/questions/832257/javascript-multiple-replace/9514142#9514142) – Elias Van Ootegem Jul 15 '13 at 14:46
1

@fnkr: `str.match(/[^.\s]+\.example\.com/g);` returns an array `[def.example.com, jkl.example.com]` – Elias Van Ootegem Jul 15 '13 at 14:48

score 4 · Accepted Answer · answered Jul 15 '13 at 15:57

Just on a side note, while HamZa's answer works for your current sample code, if you need to make sure that the domain names are also valid, you might want to try a different approach, since [^.\s]+ will match ANY character that is not a space or a . (for example, that regex will match jk&^%&*(l.example.com as a "valid" subdomain).

Since there are far fewer valid characters for domain name values than there are invalid ones, you might consider using an "additive" approach to the regex, rather than subtractive. This pattern here is probably the one that you are looking for for valid domain names: /(?:[\s.])([a-z0-9][a-z0-9-]+[a-z0-9]\.example\.com)/gi

To break it down a little more . . .

(?:[\s.]) - matches the space or . that would mark the beginning of the loweset level subdomain
([a-z0-9][a-z0-9-]+[a-z0-9]\.example\.com) - this captures a group of letters, numbers or dashes, that must begin and end with a letter or a number (domain name rules), and then the example.com domain.
gi - makes the regex pattern greedy and case insensitive

At this point, it simply a question of grabbing the matches. Since .match() doesn't play well with the regex "non-capturing groups", use .exec() instead:

var domainString = "test foo abc.def.example.com bar ghi.jkl.example.com def";
var regDomainPattern = /(?:[\s.])([a-z0-9][a-z0-9-]+[a-z0-9]\.example\.com)/gi;
var aMatchedDomainStrings = [];
var patternMatch;

// loop through as long as .exec() still gets a match, and take the second index of the result (the one that ignores the non-capturing groups)          
while (null != (patternMatch = regDomainPattern.exec(domainString))) {
    aMatchedDomainStrings.push(patternMatch[1]);
}

At that point aMatchedDomainStrings should contain all of your valid, first-level, sub-domains.

var domainString = "test foo abc.def.example.com bar ghi.jkl.example.com def";

. . . should get you: def.example.com and jkl.example.com, while:

var domainString = "test foo abc.def.example.com bar ghi.jk&^%&*(l.example.com def";

. . . should get you only: def.example.com

I don't want to ruin the mood but note that domain names supports way more than only letters, digits and hyphens. Look for example this domain `http://aa®.com`, not to forget UTF8 domain names like `http://سجل.السعودية` :p — HamZa, Jul 16 '13 at 08:40
@HamZa - Not sure we really want to get into an DNS vs. IDNA discussion in the comment section of this question. :) In the end, though, it still wouldn't change my point anyway . . . rather than allowing any character except a space or `.` (which would definitely allow for invalid domain characters), if he wants to match for validity, he will need to identify characters he wants to allow and set up the pattern match accordingly . . . whether he wants to use DNS or IDNA standards is up to him. ;) — talemyn, Jul 16 '13 at 15:49

javascript regex that gets all subdomains

2 Answers2