13

I have a Regex to match a subdomains of a web page like below

 "^https://[^/?]+\\.(sub1|sub2\\.)domain\\.com"

What would be the regex to accept any sub domain of domain.com.

Edit:

My question was incomplete, my regex was to accept only

 https:[any number of sub domain s ].sub1domain.com 

or

 https://[any number of sub domain s ].sub2domain.com

Sorry for posting incomplete question.

Exception
  • 8,111
  • 22
  • 85
  • 136

8 Answers8

38

This one should suit your needs:

https?://([a-z0-9]+[.])*sub[12]domain[.]com

Regular expression visualization

sp00m
  • 47,968
  • 31
  • 142
  • 252
  • In this way (as your demo tested) the URL https://sub1.sub2.sub1domain.com return *sub2*, while I suppose he desire *sub1.sub2*. – BAD_SEED Oct 09 '13 at 14:59
  • @marianoc84 I don't get it... What do you mean by *"return"*? – sp00m Oct 09 '13 at 15:02
  • @sp00m Possibly "captured in $1". – AlexR Oct 13 '13 at 19:51
  • Nice. Keep in mind: `-` is also allowed in subdomain and its better to escape `/`, and it should start with letter not number – stelios Aug 11 '18 at 09:43
  • @chefarov From https://stackoverflow.com/a/7111947/1225328, it looks like it does can start with a number. I don't see why it's better to escape `/`, it totally depends on your language/engine. Good point about `-` though: if needed, one could use `([a-z0-9]+(?:[a-z0-9-]*[a-z0-9])?[.])*` instead. – sp00m Aug 20 '18 at 13:27
3

Something like:

(http|https)://(.*).domain.com

At this point second tag (i.e. \2 or $2 variable) is what you need. Notice, this regex doesn't validate URL.

Proof: https://www.debuggex.com/r/3KYGmAnlnBq3C_fT

BAD_SEED
  • 4,840
  • 11
  • 53
  • 110
  • Thanks for the answer. Could you please look at my question update – Exception Oct 09 '13 at 13:30
  • Check now, and let me know! – BAD_SEED Oct 09 '13 at 14:20
  • 3
    IMPORTANT: Only use this one if you can always trust the source. This answer insecurely matches something like `"https://totally.bad.url.com/fake/out.domain.com/"`, which could be used for phishing depending on how you're displaying it. The accepted answer does not have this issue. – brainbag Mar 27 '19 at 20:41
3

I'm assuming that don't want the subdomains to differ simply by a number. Use this regex:

(^https:\/\/(?:[\w\-\_]+\.)+(?:subdomain1|subdomain2).com)

The single capture group is the full URL. Simply replace subdomain1 and subdomain2 with your actual subdomains.

I tested this on regex101.com

fred02138
  • 3,323
  • 1
  • 14
  • 17
2

Assuming the sub domains contain only numbers and lowercase letters and you do not want to accept sub subdomains:

[0-9a-z]*\.domain\.com

update:

https://.*\.sub[1|2]domain\.com

matches

https://sub1.sub2.sub1domain.com 
https://sub1.sub1domain.com 

but not

https://sub1domain.com 
Rhand
  • 901
  • 7
  • 20
2

You would use

"^https://[^/?]+\\.([^.]+)\\.domain\\.com"

which boils down to matching

"[^.]+"

for any subdomain. will match only the last part of the subdomain (www.xxx.domain.com will capture "xxx" in group 1)

AlexR
  • 2,412
  • 16
  • 26
1

Try http://([^.]+\\.)+sub[12]domain.com. A great place for testing out regexes with minimal setup pain is RegexPlanet.

Josh
  • 1,563
  • 11
  • 16
0

Here is an Regex that match any number of subdomains also allowing IDN domains and check the limit of 63 or less characters. And it check that the - is not at first or last position.

https?://([a-z0-9](?:[a-z0-9-]{1,61}[a-z0-9])?[.])*sub[12][.]domain[.]com/
SkateScout
  • 815
  • 14
  • 24
0

The forward slashes // at the start need to be escaped for proper regular expression, so correct is:

https?:\/\/([a-z0-9]+[.])*sub[12]domain[.]com