0

I want to match a web address through regex which should capture http://www.google.com as well as www.google.com i.e. with and without protocol.

GEOCHET
  • 21,119
  • 15
  • 74
  • 98
shabby
  • 3,002
  • 3
  • 39
  • 59

4 Answers4

3

Well it's going to depend on exactly what you want to capture ("FTP"? "/index.htm"?) because a general URI capture based on the RFC standard is very hard, but you could start with:

/^((https?\:\/\/)?([\w\d\-]+\.){2,}([\w\d]{2,})((\/[\w\d\-\.]+)*(\/[\w\d\-]+\.[\w\d]{3,4}(\?.*)?)?)?)$/

Complicated see?

annakata
  • 74,572
  • 17
  • 113
  • 180
2

Try RegexLib.

Mitch Wheat
  • 295,962
  • 43
  • 465
  • 541
1

Read RFC 3986. It is not just as easy as you might think it is. The job is easier if you only have a small set of URLs to parse.

dirkgently
  • 108,024
  • 16
  • 131
  • 187
  • You can get 'good enough' though, so this answer isn't particularly helpful – John Sheehan Mar 09 '09 at 22:16
  • Its about as good an answer as there was one without full problem specification. To the extent of being a competitor for the top answer. The problem is few people read the RFCs and I having read one and written a IPV6 parser know how hard the job is. – dirkgently Mar 10 '09 at 06:22
0

Why not

/google\.com/

?

It catches http://www.google.com , www.google.com , and even google.com for free! :-)

Igor
  • 26,650
  • 27
  • 89
  • 114
  • 1
    It also catches "Well I guess I could try searching for this regex on google.com, nah SO is better than google these days. Hmm, I wonder what's for lunch. Mmmm. Bacon" – annakata Feb 20 '09 at 11:01
  • Which, if you enter it in most browsers will bring you to google :) – MSalters Feb 20 '09 at 13:58
  • SO is meant to be a reference so that when you search google, you end up here instead of another crappy site. So this question is fine. – John Sheehan Mar 09 '09 at 22:17
  • 1
    @John: Please stop making paranoic comments and donevotes. This was a legitimate answer, advising how to match specific domain names (e.g. google.com). – Igor Mar 10 '09 at 08:46