4

I'm trying to write a Regex that extracts the subdomain/domain parts of a URL as separate strings.

I've tried this:

/^[^:]+:\/\/([^\.\/]+)(\.[^\.\/]+)+(?:\/|$)/

It should work against these URLs:

http;//www.mail.yahoo.co.uk/blah/blah

http;//test.test.again.mail.yahoo.com/blah/blah

I want to break it into it's parts like so:

["http://", "www", ".mail", ".yahoo", ".co", ".uk"]

["http://", "test", ".test", ".again", ".mail", ".yahoo", ".com"]

Now I'm only able to capture them as:

["http://", "www", ".uk"]

["http://", "test", ".com"]

Anyone know how I can fix my regex?

Kernel James
  • 3,752
  • 25
  • 32

3 Answers3

1

You can use /(http[s]?:\/\/|\w+(?=\.)|\.\w+)/g. Test it online

abskmj
  • 760
  • 3
  • 6
  • Could you explain why use `/^[^:]+:\/\/([^\.\/]+)(\.[^\.\/]+)+(?:\/|$)/` to match `http://test.test.again.mail.yahoo.com/blah/blah` only have two groups. Does `()+` will not expand, only one group. If I want to have multi groups with same pattern, how to write it? – Ezio Shiki Aug 19 '17 at 06:25
  • Let me try with a simple example. Consider a mobile number with digits `98997088567` and these 3 scenarios: `([9])` means match and return all occurrences of digit `9`, `([9]+)`means match and return all occurrences of consecutive `9`s, `([9])+`means all occurrences of consecutive `9`s but return only last matched group. – abskmj Aug 19 '17 at 07:24
  • Thank you! I understand. :) – Ezio Shiki Aug 19 '17 at 08:35
0

You can use the regex

(^\w+:\/\/)([^.]+)

to match the first part and then use

\.\w+

to match the second part

check the code snippet

function getSubDomains(str){
    let result = str.match(/(^\w+:\/\/)([^.]+)/);
    result.splice(0, 1);
    result = result.concat(str.match(/\.\w+/g));
    console.log(result);
    return result;
}

getSubDomains('http://www.mail.yahoo.co.uk/blah/blah');
getSubDomains('http://test.test.again.mail.yahoo.com/blah/blah');
marvel308
  • 10,288
  • 1
  • 21
  • 32
0

How about chaining matches to start by use of sticky flag y

var str = 'http://test.test.again.mail.yahoo.com/blah/blah';

var res = str.match(/^[a-z]+:\/\/|\.?[^/.\s]+/yig);

console.log(res);
  • ^[a-z]+:\/\/ matches the protocol: start, one or more a-z, followed by colon and double slash.
  • |\.?[^/.\s]+ or optional dot followed by one or more chr that are not slash, dot, whitespace.

See Regex101 demo for more explanation

bobble bubble
  • 16,888
  • 3
  • 27
  • 46