0

I know there are already questions for validating links. But I'm very bad with regex, and I don't know how to validate a user input (in html) is equivalent to these URL:

http://www.domain.com/?p=123456abcde

or

http://www.domain.com/doc/123456abcde

I guess it's like this

/^(http://)(www)((\.[A-Z0-9][A-Z0-9_-]*).com/?p=((\.[A-Z0-9][A-Z0-9_-]*)

I need the regex or the two URL. Thanks

Alex
  • 317
  • 2
  • 16

3 Answers3

2

This might not be a job for regexes, but for existing tools in your language of choice. Regexes are not a magic wand you wave at every problem that happens to involve strings. You probably want to use existing code that has already been written, tested, and debugged.

In PHP, use the parse_url function.

Perl: URI module.

Ruby: URI module.

.NET: 'Uri' class

Andy Lester
  • 91,102
  • 13
  • 100
  • 152
  • PHP's `parse_url` function isn't the way to go for this, as stated in the documentation of that very function: 'This function is not meant to validate the given URL, it only breaks it up into the above listed parts.' [`filter_var`](http://php.net/manual/en/function.filter-var.php) seems more appropriate. – PLPeeters Aug 22 '13 at 15:57
1

This will match both your strings.

(http:\/\/)?(www\.)?([A-Z0-9a-z][A-Z0-9a-z_-]*).com\/(\?p=)?([A-Z0-9a-z][\/A-Za-z0-9_-]*)

I highly recommend using a regex checker, you can find some for (almost) every OS and there are even some online ones such as: http://regexpal.com/ or http://www.quanetic.com/Regex.

Technoh
  • 1,606
  • 15
  • 34
  • 2
    [http://regex101.com/](http://regex101.com/r/eU5cB7) is good too :) – Enissay Aug 22 '13 at 15:41
  • And a good regex tutorial can be found @ http://www.regular-expressions.info/ – Enissay Aug 22 '13 at 15:42
  • 1
    While this does work, it's not generic, doesn't work with .co.uk domains, has a syntax error on the `http://` part which should be `http:\/\/`, doesn't match a potential https, allows underscores in the domain name and also matches URLs that are different from the second one, like _http://www.domain.com/foo/bar_ or even _http://www.domain.com/f_. – PLPeeters Aug 22 '13 at 15:50
  • Agreed, but the OP's question was not generic. With more details I could have built a better regex. I've edited my answer to add `\/\/` though, thanks for pointing it out. – Technoh Aug 22 '13 at 15:52
1

This will match any valid domain with the format you specified.

http(s)?:\/\/(www\.)?[a-zA-Z0-9-\.]+\.[a-z]{2,6}\/(\?p=|doc\/)[a-z0-9]+

Replace [a-z]{2,6} with com if you only want .com domains. See it in action here.

PLPeeters
  • 1,009
  • 12
  • 26