How would I get a specific portion of a regex expression?

Question

I'm trying to get the domain name of a URL. I could run a series of if statements that check what the url contains, but ideally I'd use a regex.

The following regex ^[^.]*:[\/]{0,2}[w]{0,3}[.]{0,1}[\w]*.[\w\W]*$ does enough of what I want.

This applies for: https://www.google.com http://www.google.com www.google.com

Now I just want to get google.com from this regex, but unsure how to do that.

Learn about [capturing groups](https://stackoverflow.com/questions/432493/how-do-you-access-the-matched-groups-in-a-javascript-regular-expression). — PM 77-1, Nov 30 '17 at 21:24
The exact syntax depends on what language you're using but you put the portion you want in a group using brackets. `a(b)c` will match `abc` and return `b` as the value of the first group. — Dom Weldon, Nov 30 '17 at 21:25
@DomWeldon It is tagged »javascript«, so the language is defined… — philipp, Nov 30 '17 at 21:39

philipp · Accepted Answer · 2017-11-30T21:37:14.333

Refering to the comment of @PM 77-1

RegExp.prototype.exec() (mdn-docs) gives you a result array, where each index corresponds to the »capturing groups« in your expression:

var
  input = 'Hello',
  finder = /^(H)ell(o)/m,
  match = finder.exec(input);

console.log(match) // ["Hello", "H", "o"]

Index 0 is the whole match, each following item is the result of the capturing groups, which are established by (…) in the regular expression and ordered from left to right in appearance in the expression.

score -1 · Answer 2 · answered Nov 30 '17 at 21:28

-1

If I'm understanding your question correctly,

Try this ^([^.]*:[\/]{0,2}[w]{0,3}[.]{0,1}){0,1}[\w]*.[\w\W]*$

To Explain:

I decided to group the part that looks for the protocol and the www part [^.]*:[\/]{0,2}[w]{0,3}[.]{0,1}, and made it optional by grouping it together by wrapping it in parenthesis (...) and adding a 0 or 1 times clause for the whole group {0,1}

answered Nov 30 '17 at 21:28

Cristian C.

808
11
34

your regex doesn't work, it's accepting basically anything, e.g. `https:a123zsdca.sdkfj//google.com` – Test Nov 30 '17 at 21:34
`^([^.]*:[\/]{0,2}[w]{0,3}[.]{0,1}){0,1}[\w]*.[a-zA-Z]*$` try that one. However, if you're wanting to do url matching, this has a few flaws i can see off the bat. That being said, according to your spec, that should work. – Cristian C. Nov 30 '17 at 21:44
That does not work either. It literally accepts `a112z`. But anyway my regex is fine for my purposes, and I will be using capturing groups. – Test Nov 30 '17 at 21:46
Right, but that was never part of your spec. The regex you provided matches on anything beginning with `http://www.` it can be, as you stated above `http://www.a123zsdca.sdkfj//google.com`. If you've found a way that suits you, then I'm glad. I won't pretend to know what you need this for, but like I said above, there are a lot of holes in the regex you provided if your intended purpose is url matching. – Cristian C. Nov 30 '17 at 21:49
I specified the purpose of the regex is to access the domain name, e.g. google.com, facebook.com, etc and get rid of the prefix and suffix, which would be https://, http://. I don't care about what's after domain name, since anything after the period is scrubbed. I simply need the prefix to function correctly, and then capture everything leading up to the first space, and I can now do that. Thanks for the suggestions though. – Test Nov 30 '17 at 21:52

How would I get a specific portion of a regex expression?

2 Answers2