-1

I have the following regex that attempts to match URLs:

/((http|https):(([A-Za-z0-9$_.+!*(),;/?:@&~=-])|%[A-Fa-f0-9]{2}){2,}(#([a-zA-Z0-9][a-zA-Z0-9$_.+!*(),;/?:@&~=%-]*))?([A-Za-z0-9$_+!*();/?:~-]))/g

How can I modify this regex to only match URLs of a single domain?

For example, I only want to match URLs that begin with http://www.google.com?

This should simplify my regex, but I'm too much of a regex noob to get it working (after all these years...)

eoinoc
  • 3,155
  • 3
  • 24
  • 39
  • Remember that most RE engines know about `(?:a|b)` (or similar) to match alternatives without generating an unnecessary group. Additionally you could just use `https?` to match both `http` and `https`. – hochl Apr 13 '12 at 10:17
  • Nice tip about `https?`. – eoinoc Apr 15 '12 at 12:25

1 Answers1

1

Did you write that RegEx? I don't know what it's trying to do, but it certainly doesn't match URLs correctly. Here's something it matches:

http:@@#9@?~

which I'm pretty sure isn't a valid URL.

You shouldn't be using RegEx to match URLs like this. You haven't said what language you're working in, but use whatever its equivalent of urlparse is..

Here's a relevant question: How do you validate a URL with a regular expression in Python?

Community
  • 1
  • 1
Karl Barker
  • 11,095
  • 3
  • 21
  • 26
  • Hi Karl, it's Javascript I'm working in. It's not that I'm validating a URL specifically. Given a chunk of text, I'm trying to find all the URLs I'm interested in (in order to be able to append a string). – eoinoc Apr 15 '12 at 12:25
  • 1
    If you're not interested in validating, this will match anything looking like a URL: `https?://(?:www\.)google\.com[^ ]*` for you to append to... – Karl Barker Apr 15 '12 at 21:03