-1

Possible Duplicates:
PHP validation/regex for URL
PHP regex for validating a URL

I am using

(((?:http|https):\/\/[a-zA-Z0-9\/\?=_#&%~-]+(\.[a-zA-Z0-9\/\?=_#&%~-]+)+)|(www(\.[a-zA-Z0-9\/\?=_#&%~-]+){2,}))

to validate URL in my script.

But my friend told me there is a problem with this URL:

http://www.example.com/example(200)aaaa.rar

How can I add "(" and ")" to my regexp statement?

Are there another characters should I put in my regexp?

Community
  • 1
  • 1
faressoft
  • 19,053
  • 44
  • 104
  • 146

3 Answers3

2

PHP already has a way to validate URLs, filter_var, which will work better than your regex (which as I commented above, allows false positives):

$url = "http://www.example.com/example(200)aaaa.rar";
var_dump(filter_var($url, FILTER_VALIDATE_URL));
Daniel Vandersluis
  • 91,582
  • 23
  • 169
  • 153
0

May i recommend this site: http://regexlib.com/ Click Browse at the top and select Uri button.

To answer your question though, (((?:http|https):\/\/[a-zA-Z0-9\/\?=#&%~-]+(.[a-zA-Z0-9\/\?=#&%~-]+)+)|(www(.[a-zA-Z0-9\/\?=_#&%~-\\)\\(]+){2,}))

Note the \) and \( towards the end. They must be escaped (prefixed with \\) as these are characters used for grouping within regex.

Daniel Vandersluis
  • 91,582
  • 23
  • 169
  • 153
Brad Christie
  • 100,477
  • 16
  • 156
  • 200
  • It doesn't work http://regexr.com?2simo – faressoft Nov 17 '10 at 18:20
  • That doesn't take into account all those (unfortunately) now valid internationalized domains with non-ASCII characters, though. – TeaDrivenDev Nov 17 '10 at 18:20
  • I'm not 100% familiar with this site's formatting. As such, some characters are missing within, which is why I assume point you to a source that will have the answers, unscathed. @GCATNM: very true, but I don't think (though I may be wrong) they are looking to be _that_ all-inclusive. – Brad Christie Nov 17 '10 at 18:22
  • SyntaxError: unterminated parenthetical – Ismael Dec 12 '13 at 12:00
0

I believe the specification will answer your question RFC-2068, though you will need to unpack your BNF boots for the journey.

In summary, pretty much any character can be used after the the domain name, excepting the few reserved ones which must be escaped:

The BNF [in the RFC] includes national characters not allowed in valid URLs as specified by RFC 1738, since HTTP servers are not restricted in the set of unreserved characters allowed to represent the rel_path part of addresses, and HTTP proxies may receive requests for URIs not defined by RFC 1738

Community
  • 1
  • 1
Paul Ruane
  • 37,459
  • 12
  • 63
  • 82