0

I'm trying to catch email adresses and urls such as "www.house22.com" or "info@house22.com".

For the moment, I've got the following regex:

$r['QurlA'] = '/^(.*)(?<!\pL)
    (?<QurlA>(?:[a-z]{2,}:\/\/)(?:\w+(?::\w+)?@)?(?:[a-z_-]+[.])+[a-z]{2,}(?::\d+)?(?:\/(?:\S*))?)(?<![.])
    (?<_>\s*)(.*)$/uxi';
$r['QurlB'] = '/^(.*)(?<![\pL.\/\\\\@-])
    (?<QurlB>(?:\S+:\S*@)?(?:[a-zA-Z][a-z_-]+[.])+[a-z]{1,}(?:\/(?:\S*))?)(?<![.])(?![\pL@(+-]|[.]\S)
    (?<_>\s*)(.*)$/ux';

URLs without digits work very well, so "www.house.com" or "info@house.com" are found without any problem. But with the digits at the end, the URL can not be recognized. Why is this? Does anybody see my faut?

Thank you very much!

StMan
  • 111
  • 2
  • 1
    Possible duplicate of [What is a good regular expression to match a URL?](https://stackoverflow.com/questions/3809401/what-is-a-good-regular-expression-to-match-a-url) – c2huc2hu Aug 03 '18 at 13:31
  • Also https://stackoverflow.com/questions/201323/how-to-validate-an-email-address-using-a-regular-expression – c2huc2hu Aug 03 '18 at 13:31
  • I'd like to discourage you from hand-crafting regexes to match email addresses. Would [this](http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html) help? URLs too can be quite complex (e.g. credentials for FTP URLs, UTF-8 domain-names, etc.) – Aaron Aug 03 '18 at 13:32
  • Anyway in your first regex `[a-z]{2,}` just before `(?::\d+)?` is the part that matches the TLD, and in the second it's the unique occurence of `[a-z]{1,}`. Change them into something that accepts digits to solve your immediate problem (and probably jump into the next one) – Aaron Aug 03 '18 at 13:36

0 Answers0